The goal of this project is to predict the price of a smartphone given certain characteristics. Applying Supervised Learning techniques on an open data set obtained from kaggle.com.

1. Data Preprocessing and Visualization Tools

1.1 Data Preprocessing

Firstly, we will load the data set and initialise the libraries that we are going to use during this whole study.

library(dplyr)      # selecting variables
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(mice)       # handling outliers
## 
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
## 
##     filter
## The following objects are masked from 'package:base':
## 
##     cbind, rbind
library(ggplot2)    # plots
library(forecast)    # plots
## Warning: package 'forecast' was built under R version 4.3.2
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(gridExtra)  # plots
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(grid)       # plots
library(plotly)     # plots
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ✔ readr     2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ gridExtra::combine() masks dplyr::combine()
## ✖ plotly::filter()     masks mice::filter(), dplyr::filter(), stats::filter()
## ✖ dplyr::lag()         masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(MASS)
## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:plotly':
## 
##     select
## 
## The following object is masked from 'package:dplyr':
## 
##     select
library(caret)      # machine learning
## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift
library(e1071)
library(skimr)
## Warning: package 'skimr' was built under R version 4.3.2
library(VIM)
## Loading required package: colorspace
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
##      (status 2 uses the sf package in place of rgdal)
## VIM is ready to use.
## 
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
## 
## Attaching package: 'VIM'
## 
## The following object is masked from 'package:datasets':
## 
##     sleep
library(reshape2)   # melting data for plotting
## 
## Attaching package: 'reshape2'
## 
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(GGally)     # correlations
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(glmnet)
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Loaded glmnet 4.1-8
library(rpart)
library(pROC)
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## 
## The following object is masked from 'package:colorspace':
## 
##     coords
## 
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
library(class)
library(randomForest) # random forests machine learning
## Warning: package 'randomForest' was built under R version 4.3.2
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## 
## The following object is masked from 'package:gridExtra':
## 
##     combine
## 
## The following object is masked from 'package:ggplot2':
## 
##     margin
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
library(gbm)        # Gradient Boosting
## Warning: package 'gbm' was built under R version 4.3.2
## Loaded gbm 2.1.8.1
library(xgboost)
## Warning: package 'xgboost' was built under R version 4.3.2
## 
## Attaching package: 'xgboost'
## 
## The following object is masked from 'package:plotly':
## 
##     slice
## 
## The following object is masked from 'package:dplyr':
## 
##     slice
library(glmnet)     # Ridge Regression
library(leaflet)

Loading of the data set.

rm(list = ls())
data = read.csv("ndtv_data_final.csv")
glimpse(data)
## Rows: 1,359
## Columns: 22
## $ X                      <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ Name                   <chr> "OnePlus 7T Pro McLaren Edition", "Realme X2 Pr…
## $ Brand                  <chr> "OnePlus", "Realme", "Apple", "Apple", "LG", "O…
## $ Model                  <chr> "7T Pro McLaren Edition", "X2 Pro", "iPhone 11 …
## $ Battery.capacity..mAh. <int> 4085, 4000, 3969, 3110, 4000, 3800, 4085, 4300,…
## $ Screen.size..inches.   <dbl> 6.67, 6.50, 6.50, 6.10, 6.40, 6.55, 6.67, 6.80,…
## $ Touchscreen            <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"…
## $ Resolution.x           <int> 1440, 1080, 1242, 828, 1080, 1080, 1440, 1440, …
## $ Resolution.y           <int> 3120, 2400, 2688, 1792, 2340, 2400, 3120, 3040,…
## $ Processor              <int> 8, 8, 6, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,…
## $ RAM..MB.               <int> 12000, 6000, 4000, 4000, 6000, 8000, 8000, 1200…
## $ Internal.storage..GB.  <dbl> 256, 64, 64, 64, 128, 128, 256, 256, 128, 128, …
## $ Rear.camera            <dbl> 48.0, 64.0, 12.0, 12.0, 12.0, 48.0, 48.0, 12.0,…
## $ Front.camera           <dbl> 16, 16, 12, 12, 32, 16, 16, 10, 24, 20, 16, 16,…
## $ Operating.system       <chr> "Android", "Android", "iOS", "iOS", "Android", …
## $ Wi.Fi                  <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"…
## $ Bluetooth              <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"…
## $ GPS                    <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes",…
## $ Number.of.SIMs         <int> 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2,…
## $ X3G                    <chr> "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes",…
## $ X4G..LTE               <chr> "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes",…
## $ Price                  <int> 58998, 27999, 106900, 62900, 49990, 34930, 5299…
set.seed(321)

The data set is configured by the following variables:

  • X: index -> int
  • Name: Name of the Phone -> chr
  • Brand: Brand Name -> chr
  • Model: Model of the Phone -> chr
  • Battery capacity (mAh): Battery capacity in mAh -> int
  • Screen size (inches): Screen Size in Inches across opposite corners -> dbl
  • Touchscreen: Whether the phone is touchscreen supported or not -> chr
  • Resolution x: The resolution of the phone along the width of the screen -> int
  • Resolution y: The resolution of the phone along the height of the screen -> int
  • Processor: No. of processor cores -> int
  • RAM (MB): RAM available in phone in MB -> int
  • Internal storage: Internal Storage of phone in GB -> dbl
  • Rear camera: Resolution of rear camera in MP (0 if unavailable) -> dbl
  • Front camera: Resolution of front camera in MP (0 if unavailable) -> dbl
  • Operation system: OS used in phone -> chr
  • Wi-Fi: Whether phone has WiFi functionality -> chr
  • Bluetooth: Whether phone has Bluetooth functionality -> chr
  • GPS: Whether phone has GPS functionality -> chr
  • Number of SIMs: Number of SIM card slots in phone -> int
  • 3G: Whether phone has 3G network functionality -> chr
  • 4G/LTE: Whether phone has 4G/LTE network functionality -> chr
  • Price: Price of the phone in INR -> int

We see that there are a lot of variables, however there are certain features that will not be used as we infer that are not good predictors and make the data set noisy. Those are:

  • X: index -> int -> The index does not provide any information about the mobile phone’s prices.

  • Name: Name of the Phone -> chr -> It is a variable with long strings that adds noise and with just the Brand of the phone it is enough to study.

  • Model: Model of the Phone -> chr -> It is a variable with long strings that adds noise and with just the Brand of the phone it is enough to study.

# We eliminate the 3 variables
# data = data %>% select(-X, -Name, -Model)

data$X = NULL
data$Name = NULL
data$Model = NULL

Then, the variables that are Characters will be converted into categorical so they can be used properly in the Supervised Learning tools.

# Factorise mantaining the actual names
data$Brand = factor(data$Brand)   
data$Operating.system = factor(data$Operating.system)

# Factorise with 1's and 0's (1 = Yes; 0 = No)
data$Touchscreen = factor(data$Touchscreen, levels = c("Yes", "No"), 
                          labels = c(1, 0))
data$Wi.Fi = factor(data$Wi.Fi, levels = c("Yes", "No"), 
                          labels = c(1, 0))
data$Bluetooth = factor(data$Bluetooth, levels = c("Yes", "No"), 
                          labels = c(1, 0))
data$GPS = factor(data$GPS, levels = c("Yes", "No"), 
                          labels = c(1, 0))
data$X3G = factor(data$X3G, levels = c("Yes", "No"), 
                          labels = c(1, 0))
data$X4G..LTE = factor(data$X4G..LTE, levels = c("Yes", "No"), 
                          labels = c(1, 0))

# We will also factorise the Number of SIMs as it takes 1, 2 or 3
data$Number.of.SIMs = factor(data$Number.of.SIMs, levels = c(1, 2, 3),
                             labels = c(1, 2, 3))

Now, we will focus on converting the prices of the smartphones from Indian Rupees to Euros. The actual conversion at this date is 1 INR = 0.011069359 €. We will round it so we stay using integers.

data$Price = round(data$Price * 0.011069359)
summary(data)
##        Brand     Battery.capacity..mAh. Screen.size..inches. Touchscreen
##  Intex    :117   Min.   :1010           Min.   :2.400        1:1342     
##  Samsung  :101   1st Qu.:2300           1st Qu.:5.000        0:  17     
##  Micromax : 71   Median :3000           Median :5.200                   
##  Lava     : 59   Mean   :2938           Mean   :5.291                   
##  Panasonic: 55   3rd Qu.:3500           3rd Qu.:5.700                   
##  Vivo     : 52   Max.   :6000           Max.   :7.300                   
##  (Other)  :904                                                          
##   Resolution.x     Resolution.y    Processor         RAM..MB.    
##  Min.   : 240.0   Min.   : 320   Min.   : 1.000   Min.   :   64  
##  1st Qu.: 720.0   1st Qu.:1280   1st Qu.: 4.000   1st Qu.: 1000  
##  Median : 720.0   Median :1280   Median : 4.000   Median : 2000  
##  Mean   : 811.5   Mean   :1491   Mean   : 5.551   Mean   : 2489  
##  3rd Qu.:1080.0   3rd Qu.:1920   3rd Qu.: 8.000   3rd Qu.: 3000  
##  Max.   :2160.0   Max.   :3840   Max.   :10.000   Max.   :12000  
##                                                                  
##  Internal.storage..GB.  Rear.camera      Front.camera      Operating.system
##  Min.   :  0.064       Min.   :  0.00   Min.   : 0.000   Android   :1299   
##  1st Qu.:  8.000       1st Qu.:  8.00   1st Qu.: 2.000   BlackBerry:  10   
##  Median : 16.000       Median : 12.20   Median : 5.000   Cyanogen  :  10   
##  Mean   : 30.655       Mean   : 12.07   Mean   : 7.038   iOS       :  17   
##  3rd Qu.: 32.000       3rd Qu.: 13.00   3rd Qu.: 8.000   Sailfish  :   1   
##  Max.   :512.000       Max.   :108.00   Max.   :48.000   Tizen     :   3   
##                                                          Windows   :  19   
##  Wi.Fi    Bluetooth GPS      Number.of.SIMs X3G      X4G..LTE     Price       
##  1:1351   1:1344    1:1251   1: 227         1:1214   1:1012   Min.   :   5.0  
##  0:   8   0:  15    0: 108   2:1131         0: 145   0: 347   1st Qu.:  53.0  
##                              3:   1                           Median :  77.0  
##                                                               Mean   : 126.9  
##                                                               3rd Qu.: 133.0  
##                                                               Max.   :1937.0  
## 

Taking a look again at the data we see that apparently there are no NA values. Nevertheless, we need to bear in mind that the “NA values” in this data set are expressed as 0’s in the features Rear.camera and Front.camera. Let’s take a look at those values:

# Before anything we will use the library mice to be sure that there are no NAs
md.pattern(data, rotate.names = TRUE)
##  /\     /\
## {  `---'  }
## {  O   O  }
## ==>  V <==  No need for mice. This data set is completely observed.
##  \  \|/  /
##   `-----'

##      Brand Battery.capacity..mAh. Screen.size..inches. Touchscreen Resolution.x
## 1359     1                      1                    1           1            1
##          0                      0                    0           0            0
##      Resolution.y Processor RAM..MB. Internal.storage..GB. Rear.camera
## 1359            1         1        1                     1           1
##                 0         0        0                     0           0
##      Front.camera Operating.system Wi.Fi Bluetooth GPS Number.of.SIMs X3G
## 1359            1                1     1         1   1              1   1
##                 0                0     0         0   0              0   0
##      X4G..LTE Price  
## 1359        1     1 0
##             0     0 0
# No NAs explicitly

# Amount of 0's
sum(data$Rear.camera == 0)  # 2 with no "normal" camera
## [1] 2
sum(data$Front.camera == 0) # 18 with no "selfie" camera
## [1] 18
# We see a small amount of 0 values

# Let's check those phones
data[which(data$Rear.camera == 0), ]
data[which(data$Front.camera == 0), ]

We see that the phones with no rear camera are low-budget phones that by their specifications, we deduce their target audience is people who just want to call. About the phones with no front camera we see the same tendency, low-budget phones with the same target. So keeping those observations would be a solid idea as those 0’s make sense. However, there are 2 smartphones in which we deduce there are NAs. Smartphone 429 (an Oppo of 111€) and smartphone 645 (a Samsung of 398€). One idea would be to eliminate both smartphones. Nonetheless, we see that the Samsung is actually good model in terms of specifications and by comparing it within similar price tags phones of the same company. The usual relationship between the rear and the front cameras is of 2/3. Hence, for this model we will maintain that value as a front camera. About the Oppo model we see that whenever there is rear camera around 13 the other is 8. So we will set that value as 8 and continue with the study.

data[which(data$Brand == "Samsung" & data$Price > 300 & data$Price < 500), ] # comparison from where we deduce the Rear - Front camera Ratio
data[645, ]$Front.camera = round(data[645, ]$Rear.camera * (2 / 3))

# For the Oppo we do a similar analysis
data[which(data$Brand == "Oppo" & data$Price > 90 & data$Price < 130), ]
data[429, ]$Front.camera = 8
summary(data)
##        Brand     Battery.capacity..mAh. Screen.size..inches. Touchscreen
##  Intex    :117   Min.   :1010           Min.   :2.400        1:1342     
##  Samsung  :101   1st Qu.:2300           1st Qu.:5.000        0:  17     
##  Micromax : 71   Median :3000           Median :5.200                   
##  Lava     : 59   Mean   :2938           Mean   :5.291                   
##  Panasonic: 55   3rd Qu.:3500           3rd Qu.:5.700                   
##  Vivo     : 52   Max.   :6000           Max.   :7.300                   
##  (Other)  :904                                                          
##   Resolution.x     Resolution.y    Processor         RAM..MB.    
##  Min.   : 240.0   Min.   : 320   Min.   : 1.000   Min.   :   64  
##  1st Qu.: 720.0   1st Qu.:1280   1st Qu.: 4.000   1st Qu.: 1000  
##  Median : 720.0   Median :1280   Median : 4.000   Median : 2000  
##  Mean   : 811.5   Mean   :1491   Mean   : 5.551   Mean   : 2489  
##  3rd Qu.:1080.0   3rd Qu.:1920   3rd Qu.: 8.000   3rd Qu.: 3000  
##  Max.   :2160.0   Max.   :3840   Max.   :10.000   Max.   :12000  
##                                                                  
##  Internal.storage..GB.  Rear.camera      Front.camera      Operating.system
##  Min.   :  0.064       Min.   :  0.00   Min.   : 0.000   Android   :1299   
##  1st Qu.:  8.000       1st Qu.:  8.00   1st Qu.: 2.000   BlackBerry:  10   
##  Median : 16.000       Median : 12.20   Median : 5.000   Cyanogen  :  10   
##  Mean   : 30.655       Mean   : 12.07   Mean   : 7.067   iOS       :  17   
##  3rd Qu.: 32.000       3rd Qu.: 13.00   3rd Qu.: 8.000   Sailfish  :   1   
##  Max.   :512.000       Max.   :108.00   Max.   :48.000   Tizen     :   3   
##                                                          Windows   :  19   
##  Wi.Fi    Bluetooth GPS      Number.of.SIMs X3G      X4G..LTE     Price       
##  1:1351   1:1344    1:1251   1: 227         1:1214   1:1012   Min.   :   5.0  
##  0:   8   0:  15    0: 108   2:1131         0: 145   0: 347   1st Qu.:  53.0  
##                              3:   1                           Median :  77.0  
##                                                               Mean   : 126.9  
##                                                               3rd Qu.: 133.0  
##                                                               Max.   :1937.0  
## 

Now, let’s step into the handling of outliers. First of all, before taking any outlier conclusion we must consider the nature of the topic and the data set studied. Mobile phones can vary a lot and those changes between devices are crucial. They define almost perfectly the target audience of buyers for certain devices, some companies may be more focused on a low-budget audience, others on reliability and power efficiency, others on high-end devices… Also, depending on the company market cap and their strategy, they may have a wider variety of products than others.

In short, this part of the feature engineering must be taken as just an idea of how the data set is distributed. Therefore, the 3-sigma rule, the IQR and the distribution plots will be used for that purpose.

# 3-sigma rule and IQR (numerical variables only)
# Battery - (Just 3 "outliers")
mu = mean(data$Battery.capacity..mAh.)
sigma = sd(data$Battery.capacity..mAh.)
sum(data$Battery.capacity..mAh. < mu - 3 * sigma | 
      data$Battery.capacity..mAh. > mu + 3 * sigma)
## [1] 3
QI = quantile(data$Battery.capacity..mAh., 0.25)
QS = quantile(data$Battery.capacity..mAh., 0.75)
IQR = QS - QI
sum(data$Battery.capacity..mAh. < QI - 1.5*IQR | 
      data$Battery.capacity..mAh. > QS + 1.5*IQR)
## [1] 3
# Screen - (11 from 3-sigma and 22 from IQR)
mu = mean(data$Screen.size..inches.)
sigma = sd(data$Screen.size..inches.)
sum(data$Screen.size..inches. < mu - 3 * sigma | 
      data$Screen.size..inches. > mu + 3 * sigma)
## [1] 11
QI = quantile(data$Screen.size..inches., 0.25)
QS = quantile(data$Screen.size..inches., 0.75)
IQR = QS - QI
sum(data$Screen.size..inches. < QI - 1.5*IQR | 
      data$Screen.size..inches. > QS + 1.5*IQR)
## [1] 22
# Resolution X - (Just 3 "outliers")
mu = mean(data$Resolution.x)
sigma = sd(data$Resolution.x)
sum(data$Resolution.x < mu - 3 * sigma | 
      data$Resolution.x > mu + 3 * sigma)
## [1] 3
QI = quantile(data$Resolution.x, 0.25)
QS = quantile(data$Resolution.x, 0.75)
IQR = QS - QI
sum(data$Resolution.x < QI - 1.5*IQR | 
      data$Resolution.x > QS + 1.5*IQR)
## [1] 3
# Resolution Y - (5 from 3-sigma and 21 from IQR)
mu = mean(data$Resolution.y)
sigma = sd(data$Resolution.y)
sum(data$Resolution.y < mu - 3 * sigma | 
      data$Resolution.y > mu + 3 * sigma)
## [1] 5
QI = quantile(data$Resolution.y, 0.25)
QS = quantile(data$Resolution.y, 0.75)
IQR = QS - QI
sum(data$Resolution.y < QI - 1.5*IQR | 
      data$Resolution.y > QS + 1.5*IQR)
## [1] 21
# Processor (No outliers)
mu = mean(data$Processor)
sigma = sd(data$Processor)
sum(data$Processor < mu - 3 * sigma | 
      data$Processor > mu + 3 * sigma)
## [1] 0
QI = quantile(data$Processor, 0.25)
QS = quantile(data$Processor, 0.75)
IQR = QS - QI
sum(data$Processor < QI - 1.5*IQR | 
      data$Processor > QS + 1.5*IQR)
## [1] 0
# RAM - (33 outliers)
mu = mean(data$RAM..MB.)
sigma = sd(data$RAM..MB.)
sum(data$RAM..MB. < mu - 3 * sigma | 
      data$RAM..MB. > mu + 3 * sigma)
## [1] 33
QI = quantile(data$RAM..MB., 0.25)
QS = quantile(data$RAM..MB., 0.75)
IQR = QS - QI
sum(data$RAM..MB. < QI - 1.5*IQR | 
      data$RAM..MB. > QS + 1.5*IQR)
## [1] 33
# Internal Storage - (10 from 3-sigma and 79 from IQR)
mu = mean(data$Internal.storage..GB.)
sigma = sd(data$Internal.storage..GB.)
sum(data$Internal.storage..GB. < mu - 3 * sigma | 
      data$Internal.storage..GB. > mu + 3 * sigma)
## [1] 10
QI = quantile(data$Internal.storage..GB., 0.25)
QS = quantile(data$Internal.storage..GB., 0.75)
IQR = QS - QI
sum(data$Internal.storage..GB. < QI - 1.5*IQR | 
      data$Internal.storage..GB. > QS + 1.5*IQR)
## [1] 79
# Rear Camera - (51 from 3-sigma and 91 from IQR)
mu = mean(data$Rear.camera)
sigma = sd(data$Rear.camera)
sum(data$Rear.camera < mu - 3 * sigma | 
      data$Rear.camera > mu + 3 * sigma)
## [1] 51
QI = quantile(data$Rear.camera, 0.25)
QS = quantile(data$Rear.camera, 0.75)
IQR = QS - QI
sum(data$Rear.camera < QI - 1.5*IQR | 
      data$Rear.camera > QS + 1.5*IQR)
## [1] 91
# Front camera - (26 from 3-sigma and 80 from IQR)
mu = mean(data$Front.camera)
sigma = sd(data$Front.camera)
sum(data$Front.camera < mu - 3 * sigma | 
      data$Front.camera > mu + 3 * sigma)
## [1] 26
QI = quantile(data$Front.camera, 0.25)
QS = quantile(data$Front.camera, 0.75)
IQR = QS - QI
sum(data$Front.camera < QI - 1.5*IQR | 
      data$Front.camera > QS + 1.5*IQR)
## [1] 80
# Price - (35 from 3-sigma and 142 from IQR)
mu = mean(data$Price)
sigma = sd(data$Price)
sum(data$Price < mu - 3 * sigma | 
      data$Price > mu + 3 * sigma)
## [1] 35
QI = quantile(data$Price, 0.25)
QS = quantile(data$Price, 0.75)
IQR = QS - QI
sum(data$Price < QI - 1.5*IQR | 
      data$Price > QS + 1.5*IQR)
## [1] 142

After this first glance of the outliers, we see that the RAM, internal memory, cameras and Price have a considerable amount of them.

# Seeing graphically the different distributions of the Numerical variables
data_numerical = data %>% dplyr::select(where(is.numeric))
# We normalise the numerical data to make comparisons right
data_numerical = scale(data_numerical)
# Melt the data for plotting
melted_data = melt(data_numerical)

g1 = ggplot(melted_data, aes(x = Var2, y = value, fill = Var2)) +
     geom_boxplot() +
     scale_fill_manual(values = rainbow(11)) +
     theme(axis.text.x = element_text(angle = 45, hjust = 1))
g1

Despite the amount of outliers due to the nature of this topic, actually the outliers are important data. So, before going to the Visualization part, we will normalise the numerical variables.

# For future individual plots we will save a not normalised copy
data_old = data

data$Battery.capacity..mAh. = (data$Battery.capacity..mAh. - 
                                 min(data$Battery.capacity..mAh.)) / 
                                 (max(data$Battery.capacity..mAh.) - 
                                    min(data$Battery.capacity..mAh.))
data$Screen.size..inches. = (data$Screen.size..inches. - 
                                 min(data$Screen.size..inches.)) / 
                                 (max(data$Screen.size..inches.) - 
                                    min(data$Screen.size..inches.))

data$Resolution.x = (data$Resolution.x - min(data$Resolution.x)) / 
                    (max(data$Resolution.x) - min(data$Resolution.x))

data$Resolution.y = (data$Resolution.y - min(data$Resolution.y)) / 
                    (max(data$Resolution.y) - min(data$Resolution.y))

data$Processor = (data$Processor - min(data$Processor)) / 
                 (max(data$Processor) - min(data$Processor))

data$RAM..MB. = (data$RAM..MB. - min(data$RAM..MB.)) / 
                (max(data$RAM..MB.) - min(data$RAM..MB.))

data$Internal.storage..GB. = (data$Internal.storage..GB. - 
                                min(data$Internal.storage..GB.)) / 
                              (max(data$Internal.storage..GB.) - 
                              min(data$Internal.storage..GB.))

data$Rear.camera = (data$Rear.camera - min(data$Rear.camera)) / 
                 (max(data$Rear.camera) - min(data$Rear.camera))

data$Front.camera = (data$Front.camera - min(data$Front.camera)) / 
                    (max(data$Front.camera) - min(data$Front.camera))

#data$Price = (data$Price - min(data$Price)) / (max(data$Price) - min(data$Price))
glimpse(data)
## Rows: 1,359
## Columns: 19
## $ Brand                  <fct> OnePlus, Realme, Apple, Apple, LG, OnePlus, One…
## $ Battery.capacity..mAh. <dbl> 0.6162325, 0.5991984, 0.5929860, 0.4208417, 0.5…
## $ Screen.size..inches.   <dbl> 0.8714286, 0.8367347, 0.8367347, 0.7551020, 0.8…
## $ Touchscreen            <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Resolution.x           <dbl> 0.6250000, 0.4375000, 0.5218750, 0.3062500, 0.4…
## $ Resolution.y           <dbl> 0.7954545, 0.5909091, 0.6727273, 0.4181818, 0.5…
## $ Processor              <dbl> 0.7777778, 0.7777778, 0.5555556, 0.5555556, 0.7…
## $ RAM..MB.               <dbl> 1.0000000, 0.4973190, 0.3297587, 0.3297587, 0.4…
## $ Internal.storage..GB.  <dbl> 0.4999375, 0.1248906, 0.1248906, 0.1248906, 0.2…
## $ Rear.camera            <dbl> 0.4444444, 0.5925926, 0.1111111, 0.1111111, 0.1…
## $ Front.camera           <dbl> 0.3333333, 0.3333333, 0.2500000, 0.2500000, 0.6…
## $ Operating.system       <fct> Android, Android, iOS, iOS, Android, Android, A…
## $ Wi.Fi                  <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Bluetooth              <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ GPS                    <fct> 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Number.of.SIMs         <fct> 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2,…
## $ X3G                    <fct> 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,…
## $ X4G..LTE               <fct> 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,…
## $ Price                  <dbl> 653, 310, 1183, 696, 553, 387, 587, 882, 421, 2…
summary(data)
##        Brand     Battery.capacity..mAh. Screen.size..inches. Touchscreen
##  Intex    :117   Min.   :0.0000         Min.   :0.0000       1:1342     
##  Samsung  :101   1st Qu.:0.2585         1st Qu.:0.5306       0:  17     
##  Micromax : 71   Median :0.3988         Median :0.5714                  
##  Lava     : 59   Mean   :0.3865         Mean   :0.5901                  
##  Panasonic: 55   3rd Qu.:0.4990         3rd Qu.:0.6735                  
##  Vivo     : 52   Max.   :1.0000         Max.   :1.0000                  
##  (Other)  :904                                                          
##   Resolution.x     Resolution.y      Processor         RAM..MB.      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.2500   1st Qu.:0.2727   1st Qu.:0.3333   1st Qu.:0.07842  
##  Median :0.2500   Median :0.2727   Median :0.3333   Median :0.16220  
##  Mean   :0.2977   Mean   :0.3326   Mean   :0.5057   Mean   :0.20315  
##  3rd Qu.:0.4375   3rd Qu.:0.4545   3rd Qu.:0.7778   3rd Qu.:0.24598  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
##                                                                      
##  Internal.storage..GB.  Rear.camera       Front.camera       Operating.system
##  Min.   :0.00000       Min.   :0.00000   Min.   :0.00000   Android   :1299   
##  1st Qu.:0.01550       1st Qu.:0.07407   1st Qu.:0.04167   BlackBerry:  10   
##  Median :0.03113       Median :0.11296   Median :0.10417   Cyanogen  :  10   
##  Mean   :0.05976       Mean   :0.11176   Mean   :0.14724   iOS       :  17   
##  3rd Qu.:0.06238       3rd Qu.:0.12037   3rd Qu.:0.16667   Sailfish  :   1   
##  Max.   :1.00000       Max.   :1.00000   Max.   :1.00000   Tizen     :   3   
##                                                            Windows   :  19   
##  Wi.Fi    Bluetooth GPS      Number.of.SIMs X3G      X4G..LTE     Price       
##  1:1351   1:1344    1:1251   1: 227         1:1214   1:1012   Min.   :   5.0  
##  0:   8   0:  15    0: 108   2:1131         0: 145   0: 347   1st Qu.:  53.0  
##                              3:   1                           Median :  77.0  
##                                                               Mean   : 126.9  
##                                                               3rd Qu.: 133.0  
##                                                               Max.   :1937.0  
## 

So in the end, the data set that we will be working from now on is defined by the following features:

  • Brand: Brand Name -> fct with 76 levels of the different brands such as OnePlus, Xiaomi, Apple…

  • Battery capacity (mAh): Battery capacity in mAh -> dbl (normalised) int (non-normalised).

  • Screen size (inches): Screen Size in Inches across opposite corners -> dbl

  • Touchscreen: Whether the phone is touchscreen supported or not -> fct (1 = it has; 0 = it has NOT).

  • Resolution x: The resolution of the phone along the width of the screen -> dbl (normalised) int (non-normalised).

  • Resolution y: The resolution of the phone along the height of the screen -> dbl (normalised) int (non-normalised).

  • Processor: No. of processor cores -> dbl (normalised) int (non-normalised).

  • RAM (MB): RAM available in phone in MB -> dbl (normalised) int (non-normalised).

  • Internal storage: Internal Storage of phone in GB -> dbl (normalised) int (non-normalised).

  • Rear camera: Resolution of rear camera in MP (0 if unavailable) -> dbl

  • Front camera: Resolution of front camera in MP (0 if unavailable) -> dbl

  • Operation system: OS used in phone -> fct with 7 levels of the different OS such as Android, iOS…

  • Wi-Fi: Whether phone has WiFi functionality -> fct (1 = it has; 0 = it has NOT).

  • Bluetooth: Whether phone has Bluetooth functionality -> fct (1 = it has; 0 = it has NOT).

  • GPS: Whether phone has GPS functionality -> fct (1 = it has; 0 = it has NOT).

  • Number of SIMs: Number of SIM card slots in phone -> fct with 3 levels (1, 2 or 3 SIMs).

  • 3G: Whether phone has 3G network functionality -> fct (1 = it has; 0 = it has NOT).

  • 4G/LTE: Whether phone has 4G/LTE network functionality -> fct (1 = it has; 0 = it has NOT).

  • Price: Price of the phone in € -> dbl (normalised) int (non-normalised).

1.2 Visualization

In this part of the practice we will make an visual analysis of the data in order to get a better understanding of the behaviour of it and the features. We will follow the next procedure:

Amount of devices per Brand:

g2 = ggplot(data, aes(x = fct_infreq(Brand), fill = fct_infreq(Brand))) +
     geom_bar() +
     scale_fill_viridis_d(name = "Colours", direction = -1) +  # Use a color gradient
     theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
     labs(width = 15, height = 10) + 
     labs(title = "Amount of devices per Brand", x = "Brands", y = "Amount")
g2

The bast majority of cell phones are built by Intex, Samsung and Micromax.

Percentages of Touchscreen, WiFi, Bluetooth, GPS, 3G and 4G devices:

# TOUCH SCREEN
# Count occurrences of each category in the 'Touchscreen' column
touchscreen_counts = table(data$Touchscreen)

# Calculate percentages
touchscreen_percentages = round(prop.table(touchscreen_counts) * 100, 2)

# Create a data frame with the counts and percentages for plotting
touchscreen_df = data.frame(
  Touchscreen = names(touchscreen_counts),
  Count = as.numeric(touchscreen_counts),
  Percentage = touchscreen_percentages
)

# Create a pie chart with percentages using ggplot2
gPie1 = ggplot(touchscreen_df, aes(x = "", y = Count, fill = Touchscreen)) +
        geom_bar(stat = "identity", width = 1) +
        coord_polar("y", start = 0) +
        geom_text(aes(label = paste0(Percentage.Freq, "%")), 
                      position = position_stack(vjust = 0.5), 
                      size = 5,
                      show.legend = FALSE) +
        labs(fill = "TOUCHSCREEN") +
        ggtitle("Distribution of Touchscreen") +
        theme_void() +
        scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# WIFI
# Count occurrences of each category in the 'Wi.Fi' column
wifi_counts <- table(data$Wi.Fi)

# Calculate percentages
wifi_percentages <- round(prop.table(wifi_counts) * 100, 2)

# Create a data frame with the counts and percentages for plotting
wifi_df <- data.frame(
  Wi.Fi = names(wifi_counts),
  Count = as.numeric(wifi_counts),
  Percentage = wifi_percentages
)

# Create a pie chart with percentages using ggplot2 for Wi.Fi
gPie2 = ggplot(wifi_df, aes(x = "", y = Count, fill = Wi.Fi)) +
        geom_bar(stat = "identity", width = 1) +
        coord_polar("y", start = 0) +
        geom_text(aes(label = paste0(Percentage.Freq, "%")), 
                      position = position_stack(vjust = 0.5), 
                      size = 5,
                      show.legend = FALSE) +
        labs(fill = "Wi.Fi") +
        ggtitle("Distribution of WiFi") +
        theme_void() +
        scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# BLUETOOTH
# Count occurrences of each category in the 'Bluetooth' column
bluetooth_counts = table(data$Bluetooth)

# Calculate percentages
bluetooth_percentages = round(prop.table(bluetooth_counts) * 100, 2)

# Create a data frame with the counts and percentages for plotting
bluetooth_df = data.frame(
  Bluetooth = names(bluetooth_counts),
  Count = as.numeric(bluetooth_counts),
  Percentage = bluetooth_percentages
)

# Create a pie chart with percentages using ggplot2 for Bluetooth
gPie3 = ggplot(bluetooth_df, aes(x = "", y = Count, fill = Bluetooth)) +
        geom_bar(stat = "identity", width = 1) +
        coord_polar("y", start = 0) +
        geom_text(aes(label = paste0(Percentage.Freq, "%")), 
                      position = position_stack(vjust = 0.5), 
                      size = 5,
                      show.legend = FALSE) +
        labs(fill = "Bluetooth") +
        ggtitle("Distribution of Bluetooth") +
        theme_void() +
        scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# GPS
# Count occurrences of each category in the 'GPS' column
gps_counts = table(data$GPS)

# Calculate percentages
gps_percentages = round(prop.table(gps_counts) * 100, 2)

# Create a data frame with the counts and percentages for plotting
gps_df = data.frame(
  GPS = names(gps_counts),
  Count = as.numeric(gps_counts),
  Percentage = gps_percentages
)

# Create a pie chart with percentages using ggplot2 for GPS
gPie4 = ggplot(gps_df, aes(x = "", y = Count, fill = GPS)) +
        geom_bar(stat = "identity", width = 1) +
        coord_polar("y", start = 0) +
        geom_text(aes(label = paste0(Percentage.Freq, "%")), 
                      position = position_stack(vjust = 0.5), 
                      size = 5,
                      show.legend = FALSE) +
        labs(fill = "GPS") +
        ggtitle("Distribution of GPS") +
        theme_void() +
        scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# 3G
# Count occurrences of each category in the 'X3G' column
x3g_counts = table(data$X3G)

# Calculate percentages
x3g_percentages = round(prop.table(x3g_counts) * 100, 2)

# Create a data frame with the counts and percentages for plotting
x3g_df = data.frame(
  X3G = names(x3g_counts),
  Count = as.numeric(x3g_counts),
  Percentage = x3g_percentages
)

# Create a pie chart with percentages using ggplot2 for X3G
gPie5 = ggplot(x3g_df, aes(x = "", y = Count, fill = X3G)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y", start = 0) +
  geom_text(aes(label = paste0(Percentage.Freq, "%")), 
            position = position_stack(vjust = 0.5), 
            size = 5,
            show.legend = FALSE) +
  labs(fill = "X3G") +
  ggtitle("Distribution of 3G") +
  theme_void() +
  scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# 4G LTE
# Count occurrences of each category in the 'X4G..LTE' column
x4g_lte_counts = table(data$X4G..LTE)

# Calculate percentages
x4g_lte_percentages = round(prop.table(x4g_lte_counts) * 100, 2)

# Create a data frame with the counts and percentages for plotting
x4g_lte_df = data.frame(
  X4G_LTE = names(x4g_lte_counts),
  Count = as.numeric(x4g_lte_counts),
  Percentage = x4g_lte_percentages
)

# Create a pie chart with percentages using ggplot2 for X4G..LTE
gPie6 = ggplot(x4g_lte_df, aes(x = "", y = Count, fill = X4G_LTE)) +
        geom_bar(stat = "identity", width = 1) +
        coord_polar("y", start = 0) +
        geom_text(aes(label = paste0(Percentage.Freq, "%")), 
                      position = position_stack(vjust = 0.5), 
                      size = 5,
                      show.legend = FALSE) +
        labs(fill = "X4G..LTE") +
        ggtitle("Distribution of 4G LTE") +
        theme_void() +
        scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# All the Pie charts together
g3 = grid.arrange(gPie1, gPie2, gPie3, gPie4, gPie5, gPie6, ncol = 3)

We see that the amount of devices without either a touchscreen, WiFi or Bluetooth are negligible. In terms of phones without GPS or 3G capabilities are around 10% which explains a market cap in which the target audience are users that just want phones to call. However the 25% of phones without 4G LTE connection suggests that probably cheap phones do not have that capability. This will be studied in the next plot.

4G LTE vs 3G vs GPS in terms of Price:

# Create a grouped bar plot
g4 = ggplot(data, aes(x = GPS, y = Price, fill = factor(X3G))) +
  geom_bar(stat = "identity", position = "dodge", color = "black", alpha = 0.8) +
  labs(x = "GPS Capability", y = "Price", fill = "X3G Capability", 
       title = "Price vs. GPS and X3G Capabilities") +
  scale_fill_discrete(name = "X3G Capability", labels = c("No", "Yes")) +
  theme_minimal()
g4

g5 = ggplot(data, aes(x = GPS, y = Price, fill = X4G..LTE)) +
  geom_bar(stat = "identity", position = "dodge", color = "black", alpha = 0.8) +
  labs(x = "GPS Capability", y = "Price", fill = "4G LTE Capability", 
       title = "Price vs. GPS and 4G LTE Capabilities") +
  scale_fill_discrete(name = "4G LTE Capability", labels = c("No", "Yes")) +
  theme_minimal()
g5

g6 = ggplot(data, aes(x = GPS, y = Price, fill = X3G)) + 
     geom_boxplot() + facet_grid(data$X4G..LTE) +
     labs(title = "Price vs. 4G LTE, GPS and X3G", x = "GPS", y = "Price") +
     ylim(c(0,750))
g6
## Warning: Removed 19 rows containing non-finite values (`stat_boxplot()`).

We see that low-budget phones tend to be overall devices without GPS and 3G capable. We also find that there are a significant amount of devices without 3G while having 4G LTE.

Operating Systems:

OS and Prices:

g7 = ggplot(data_old) + aes(x = Operating.system, y = Price, 
                        fill = Operating.system) + 
     geom_boxplot() + theme(legend.position = "none") + 
     labs(title = "Operating System and their Prices", x = "OS", y = "Price")
g7

From the plot it can be deduced that iOS devices are the most expensive, while Cyanogen, Sailfish and Tizen not. Another thing to mention is the amount of outliers in the Android devices, they tend to be lower than 300€ however there are a considerable number of high-end phones.

OS devices:

g8 = ggplot(data, aes(x = fct_infreq(Operating.system))) +
     geom_bar(fill = "skyblue", color = "black") +
     scale_fill_viridis_d(direction = -1) +
     labs(x = "Operating System", y = "Count", title = "OS count")
g8

The bast majority of devices are Android and surprisingly there are more Windows phones than iOS devices.

Numerical Features:

Battery Capacity:

g9 = ggplot(data_old, aes(x = Battery.capacity..mAh.)) +
     geom_density(fill = "royalblue", color = "skyblue", alpha = 0.75) +
     labs(x = "Battery capacity (mAh)", y = "Density", title = "Distribution of Battery Capacity")
g9

From the distribution we deduce that there are three main smartphones categories. The majority of the cellphones are using a between a 2000 and 3000 mAh; the high-end phones around 4000 mAh; and the ones just dedicated for extreme duration around 5000 mAh.

Cameras:

g10 = ggplot(data_old) + aes(x = Rear.camera, y = Front.camera) + 
  geom_count(color  ="lightslateblue") + geom_smooth(method = "lm") + 
  labs(title = "Rear vs Front camera", x = "Rear camera", y = "Front camera")
g10
## `geom_smooth()` using formula = 'y ~ x'

The majority of the phones have lower-end cameras. Also, most of the phones tend to have better rear camera than the front. Nonetheless, in the 20 to 40 MP range of the front camera the most common behaviour is that the front camera is better than the rear camera.

g11 = ggplot(data_old)+aes(x = Rear.camera, y = Front.camera) + 
      geom_point(aes(color = Price)) + scale_color_continuous(trans = "log")
g11

When studying the camera quality and the price we do not get any surprises, the lower the camera quality the lower the price.

Screen’s Resolution:

g12 = plot_ly(data = data_old, x = ~Resolution.x, y = ~Resolution.y, 
              type = "scatter", mode = "markers") %>%
      layout(title = "Screen's Resolution", xaxis = list(title = "X resolution"),
             yaxis = list(title = "Y resolution"))
g12

Phones with screens seem to have 3 main resolutions 480p (480 x 800), 720p (720 x 1280) and 1080p plus (1080 x 2200). (Note that the x and y resolutions are in a vertical orientation as we are studying smartphones not televisions or any other kind of horizontal monitors).

Screen Size vs Price by Most popular Brand:

ggplot(data_old %>% filter(Brand %in% c("Intex", "Samsung", "Micromax", "Lava",
                                     "Panasonic", "Vivo", "Xiaomi", "Apple"))) +
  aes(x = Screen.size..inches., y = Brand[Brand %in% c("Intex", "Samsung",
                                                       "Micromax", "Lava",
                                     "Panasonic", "Vivo", "Xiaomi", "Apple")]) +
  geom_violin(alpha = 0.3, fill = "skyblue") +  # Adjusted fill color
  geom_jitter(aes(color = Price)) +
  scale_color_viridis_c(trans = "log", direction = -1) +  # Reversed color scale orientation
  labs(title = "Screen Size vs Price by Brands", y = "Brands") + coord_flip()

We selected the brands with the highest amount of smartphones and Apple to have a comparison in prices. We saw that smartphones with bigger screens are the most expensive no matter the brand. But that is just within devices of a specific brand. That means, if you take, for instance, a smartphone from Lava the smallest device will be the cheapest Lava smartphone. In overall terms, from 5.5 inches you cannot tell the difference of price just by the screen size. But you can if you consider the brand.

2. Classification

In this section of the project, we delve into the implementation and evaluation of several supervised learning techniques. As we want to predict the price of a phone given its characteristics, we will divide in two major groups to classify:

Before doing any classification supervised learning technique, we must interpret correlations to focus on prediction.

gCor1 = ggplot() + aes(x = cor(data_numerical)["Price",], 
                           y = reorder(names(cor(data_numerical)["Price",]),
                                       cor(data_numerical)["Price",])) +
        geom_col(fill = "mediumorchid1") + labs(title = "Correlations",
          x = "Correlation",y = "Variables") + theme_bw()
gCor1

gCor2 = ggcorr(data_numerical, label = TRUE)
gCor2

Now we create the groups for price, son we create a new variable named PriceClass:

data_old$PriceClass = factor(ifelse(data_old$Price < 100, "Cheap", "Expensive"))
levels(data_old$PriceClass)
## [1] "Cheap"     "Expensive"
count <- table(data_old$PriceClass)
percentages <- prop.table(count) * 100
percentages
## 
##     Cheap Expensive 
##  62.25166  37.74834
data_classification <- data_old
data_classification$Price = NULL

# We need to remove the Brand name too, since it is not useful for the classification
data_classification$Brand = NULL

Now we divide in test and training sets.

spl = createDataPartition(data_classification$PriceClass, p = 0.8, list = FALSE)

PhonesTrain = data_classification[spl,]
PhonesTest = data_classification[-spl,]

t = table(PhonesTrain$PriceClass)
prop.table(t)
## 
##     Cheap Expensive 
## 0.6222426 0.3777574

We can see that the data set is more or less balanced, where the cheap phones are the 62% and the expensive ones the 38%.

table(PhonesTrain$PriceClass, PhonesTrain$Internal.storage..GB.)
##            
##             0.16 0.512   1   2   3   4   8  16  32  64 128 256 512
##   Cheap        1     8   2   1   1  44 233 259 111  17   0   0   0
##   Expensive    0     0   0   0   0   5  23  83 115 125  53   6   1
ggplot(PhonesTrain, aes(x=PriceClass, fill = as.factor(Internal.storage..GB.))) + geom_bar()

ggplot(PhonesTrain, aes(x= as.factor(Internal.storage..GB.),fill = PriceClass)) + geom_bar()

The phones with highest Internal Storage belong to the Expensive group. The inverse happens with the lowest Storage values. However, phones with 16 and 32 GB storages are very distributed between the 2 groups.

2.1 Logistic Regression

Because we have binary classification, we can use the standard glm function in R:

logit.model <- glm(PriceClass ~ ., family=binomial(link='logit'), data=PhonesTrain)
summary(logit.model)
## 
## Call:
## glm(formula = PriceClass ~ ., family = binomial(link = "logit"), 
##     data = PhonesTrain)
## 
## Coefficients:
##                              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                -5.159e+00  1.245e+00  -4.143 3.43e-05 ***
## Battery.capacity..mAh.     -1.328e-04  1.583e-04  -0.839 0.401588    
## Screen.size..inches.        1.409e-01  2.794e-01   0.504 0.614182    
## Touchscreen0               -9.129e-02  1.134e+00  -0.080 0.935853    
## Resolution.x                2.669e-03  1.200e-03   2.223 0.026190 *  
## Resolution.y               -1.111e-04  6.892e-04  -0.161 0.871935    
## Processor                   2.621e-02  5.697e-02   0.460 0.645402    
## RAM..MB.                    3.781e-04  1.803e-04   2.097 0.035963 *  
## Internal.storage..GB.       4.210e-02  1.012e-02   4.161 3.18e-05 ***
## Rear.camera                 8.740e-02  3.216e-02   2.717 0.006580 ** 
## Front.camera               -2.196e-02  2.529e-02  -0.868 0.385146    
## Operating.systemBlackBerry  2.007e+00  1.006e+00   1.994 0.046151 *  
## Operating.systemCyanogen   -5.328e-01  1.317e+00  -0.404 0.685862    
## Operating.systemiOS         2.651e+00  1.137e+00   2.332 0.019722 *  
## Operating.systemSailfish   -1.266e+01  8.827e+02  -0.014 0.988561    
## Operating.systemTizen      -1.113e+01  6.240e+02  -0.018 0.985765    
## Operating.systemWindows     9.766e-01  6.464e-01   1.511 0.130844    
## Wi.Fi0                      7.113e-01  1.496e+00   0.476 0.634418    
## Bluetooth0                 -1.439e+00  1.184e+00  -1.216 0.224053    
## GPS0                       -1.637e+00  4.745e-01  -3.450 0.000561 ***
## Number.of.SIMs2            -1.093e+00  2.749e-01  -3.975 7.04e-05 ***
## Number.of.SIMs3            -1.200e+01  8.827e+02  -0.014 0.989154    
## X3G0                       -1.627e+00  4.232e-01  -3.845 0.000121 ***
## X4G..LTE0                   9.475e-01  2.840e-01   3.336 0.000850 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1442.59  on 1087  degrees of freedom
## Residual deviance:  798.58  on 1064  degrees of freedom
## AIC: 846.58
## 
## Number of Fisher Scoring iterations: 13
probability <- predict(logit.model,newdata=PhonesTest, type='response')
head(probability)
##         7         8         9        12        17        18 
## 0.9999998 0.9999992 0.9999491 0.9995674 0.9999932 0.9936506
prediction <- as.factor(ifelse(probability > 0.5,"Expensive","Cheap"))
head(prediction)
##         7         8         9        12        17        18 
## Expensive Expensive Expensive Expensive Expensive Expensive 
## Levels: Cheap Expensive

The confusion matrix is:

conf_log_reg = confusionMatrix(prediction, PhonesTest$PriceClass)
conf_log_reg
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Cheap Expensive
##   Cheap       148        29
##   Expensive    21        73
##                                           
##                Accuracy : 0.8155          
##                  95% CI : (0.7641, 0.8598)
##     No Information Rate : 0.6236          
##     P-Value [Acc > NIR] : 5.346e-12       
##                                           
##                   Kappa : 0.6008          
##                                           
##  Mcnemar's Test P-Value : 0.3222          
##                                           
##             Sensitivity : 0.8757          
##             Specificity : 0.7157          
##          Pos Pred Value : 0.8362          
##          Neg Pred Value : 0.7766          
##              Prevalence : 0.6236          
##          Detection Rate : 0.5461          
##    Detection Prevalence : 0.6531          
##       Balanced Accuracy : 0.7957          
##                                           
##        'Positive' Class : Cheap           
## 

We can see that the accuracy obtained is pretty good (0.82).

2.1.1 Penalized Logistic Regression

Even though the dimension of the data set is not very high, we are going to try the penalized version:

p.logit.model <- glmnet(as.matrix(PhonesTrain[,-1]),PhonesTrain$PriceClass, family=c("binomial"), alpha=0, lambda=0.01)
## Warning in storage.mode(xd) <- "double": NAs introducidos por coerción
probability <- predict(p.logit.model,as.matrix(PhonesTrain[,-1]), type='response')
## Warning in cbind2(1, newx) %*% nbeta: NAs introducidos por coerción
prediction <- as.factor(ifelse(probability > 0.5,"Expensive","Cheap"))

conf_p_log_reg = confusionMatrix(prediction, PhonesTrain$PriceClass)
conf_p_log_reg
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Cheap Expensive
##   Cheap       612       125
##   Expensive    65       286
##                                           
##                Accuracy : 0.8254          
##                  95% CI : (0.8015, 0.8475)
##     No Information Rate : 0.6222          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6176          
##                                           
##  Mcnemar's Test P-Value : 1.866e-05       
##                                           
##             Sensitivity : 0.9040          
##             Specificity : 0.6959          
##          Pos Pred Value : 0.8304          
##          Neg Pred Value : 0.8148          
##              Prevalence : 0.6222          
##          Detection Rate : 0.5625          
##    Detection Prevalence : 0.6774          
##       Balanced Accuracy : 0.7999          
##                                           
##        'Positive' Class : Cheap           
## 

An accuracy of 0.83 was obtained, no appreciable improvement.

2.1.2 ROC curve

ROC curve shows true positives vs false positives in relation with different thresholds:

  • y-axis = Sensitivity (TP)
  • x-axis = Specificity (1-FP)
model <- lda(PriceClass ~ ., data=PhonesTrain, prior = c(.9, .1))

probability = predict(model, PhonesTest)$posterior

roc.lda <- roc(PhonesTest$PriceClass,probability[,2])
## Setting levels: control = Cheap, case = Expensive
## Setting direction: controls < cases
auc(roc.lda) 
## Area under the curve: 0.9008
plot.roc(PhonesTest$PriceClass, probability[,2],col="darkblue", print.auc = TRUE,  auc.polygon=TRUE, grid=c(0.1, 0.2), 
grid.col=c("green", "red"), max.auc.polygon=TRUE,auc.polygon.col="lightblue", print.thres=TRUE, legacy.axes = TRUE)
## Setting levels: control = Cheap, case = Expensive
## Setting direction: controls < cases

The AUC is 0.902, which means that we have a great prediction.

A threshold around 0.05 seems to be the more balanced one. However, a company may need a different one depending on their interests.

2.2 Bayes Classifiers

2.2.1 LDA

We are going to start with a LDA (Linear Discriminant Analysis), where variance is reduced by introducing some bias.

lda.model1 <- lda(PriceClass ~ ., data=PhonesTrain, prior = c(3/5, 2/5)) 
lda.model1
## Call:
## lda(PriceClass ~ ., data = PhonesTrain, prior = c(3/5, 2/5))
## 
## Prior probabilities of groups:
##     Cheap Expensive 
##       0.6       0.4 
## 
## Group means:
##           Battery.capacity..mAh. Screen.size..inches. Touchscreen0 Resolution.x
## Cheap                   2699.154             5.062718   0.02067947     681.7400
## Expensive               3307.139             5.650584   0.00243309     998.4088
##           Resolution.y Processor RAM..MB. Internal.storage..GB. Rear.camera
## Cheap         1226.069  4.809453 1739.925              16.00481    9.008567
## Expensive     1898.706  6.722628 3656.934              53.63504   16.949392
##           Front.camera Operating.systemBlackBerry Operating.systemCyanogen
## Cheap         4.929985                 0.00295421              0.007385524
## Expensive    10.588078                 0.00973236              0.002433090
##           Operating.systemiOS Operating.systemSailfish Operating.systemTizen
## Cheap             0.001477105              0.001477105            0.00295421
## Expensive         0.036496350              0.000000000            0.00000000
##           Operating.systemWindows      Wi.Fi0 Bluetooth0       GPS0
## Cheap                  0.01329394 0.008862629 0.01624815 0.09748892
## Expensive              0.01946472 0.002433090 0.00243309 0.03406326
##           Number.of.SIMs2 Number.of.SIMs3       X3G0 X4G..LTE0
## Cheap           0.8847858     0.001477105 0.11225997 0.3190547
## Expensive       0.7396594     0.000000000 0.09489051 0.1630170
## 
## Coefficients of linear discriminants:
##                                      LD1
## Battery.capacity..mAh.     -2.251542e-05
## Screen.size..inches.       -1.161382e-01
## Touchscreen0                8.879698e-02
## Resolution.x                1.576805e-03
## Resolution.y                4.155798e-04
## Processor                   7.750866e-02
## RAM..MB.                    2.282514e-04
## Internal.storage..GB.       5.439563e-03
## Rear.camera                 1.245843e-02
## Front.camera                1.925067e-02
## Operating.systemBlackBerry  1.620904e+00
## Operating.systemCyanogen   -9.204590e-02
## Operating.systemiOS         1.711426e+00
## Operating.systemSailfish   -8.160616e-01
## Operating.systemTizen      -5.620493e-02
## Operating.systemWindows     7.314360e-01
## Wi.Fi0                      3.049030e-01
## Bluetooth0                 -6.131765e-01
## GPS0                       -6.193790e-01
## Number.of.SIMs2            -6.718253e-01
## Number.of.SIMs3            -8.304434e-01
## X3G0                       -6.299292e-01
## X4G..LTE0                   3.369768e-01

Note prior = c(3/5, 2/5) are roughly the class proportions for the training set, hence it’s equivalent to

lda.model2 <- lda(PriceClass ~ ., data=PhonesTrain)
lda.model2
## Call:
## lda(PriceClass ~ ., data = PhonesTrain)
## 
## Prior probabilities of groups:
##     Cheap Expensive 
## 0.6222426 0.3777574 
## 
## Group means:
##           Battery.capacity..mAh. Screen.size..inches. Touchscreen0 Resolution.x
## Cheap                   2699.154             5.062718   0.02067947     681.7400
## Expensive               3307.139             5.650584   0.00243309     998.4088
##           Resolution.y Processor RAM..MB. Internal.storage..GB. Rear.camera
## Cheap         1226.069  4.809453 1739.925              16.00481    9.008567
## Expensive     1898.706  6.722628 3656.934              53.63504   16.949392
##           Front.camera Operating.systemBlackBerry Operating.systemCyanogen
## Cheap         4.929985                 0.00295421              0.007385524
## Expensive    10.588078                 0.00973236              0.002433090
##           Operating.systemiOS Operating.systemSailfish Operating.systemTizen
## Cheap             0.001477105              0.001477105            0.00295421
## Expensive         0.036496350              0.000000000            0.00000000
##           Operating.systemWindows      Wi.Fi0 Bluetooth0       GPS0
## Cheap                  0.01329394 0.008862629 0.01624815 0.09748892
## Expensive              0.01946472 0.002433090 0.00243309 0.03406326
##           Number.of.SIMs2 Number.of.SIMs3       X3G0 X4G..LTE0
## Cheap           0.8847858     0.001477105 0.11225997 0.3190547
## Expensive       0.7396594     0.000000000 0.09489051 0.1630170
## 
## Coefficients of linear discriminants:
##                                      LD1
## Battery.capacity..mAh.     -2.251542e-05
## Screen.size..inches.       -1.161382e-01
## Touchscreen0                8.879698e-02
## Resolution.x                1.576805e-03
## Resolution.y                4.155798e-04
## Processor                   7.750866e-02
## RAM..MB.                    2.282514e-04
## Internal.storage..GB.       5.439563e-03
## Rear.camera                 1.245843e-02
## Front.camera                1.925067e-02
## Operating.systemBlackBerry  1.620904e+00
## Operating.systemCyanogen   -9.204590e-02
## Operating.systemiOS         1.711426e+00
## Operating.systemSailfish   -8.160616e-01
## Operating.systemTizen      -5.620493e-02
## Operating.systemWindows     7.314360e-01
## Wi.Fi0                      3.049030e-01
## Bluetooth0                 -6.131765e-01
## GPS0                       -6.193790e-01
## Number.of.SIMs2            -6.718253e-01
## Number.of.SIMs3            -8.304434e-01
## X3G0                       -6.299292e-01
## X4G..LTE0                   3.369768e-01

In practice, a bit better performance is attained if we shrink the prior probabilities towards 1/3

Output: posterior probabilities

probability = predict(lda.model2, newdata=PhonesTest)$posterior 
head(probability)
##           Cheap Expensive
## 7  0.0003965069 0.9996035
## 8  0.0002278220 0.9997722
## 9  0.0017598720 0.9982401
## 12 0.0395493058 0.9604507
## 17 0.0043647393 0.9956353
## 18 0.0780282858 0.9219717

To predict the labels for delay, we apply the Bayes rule of maximum probability

prediction <- max.col(probability) 
head(prediction)
## [1] 2 2 2 2 2 2

which is equivalent to

prediction = predict(lda.model2, newdata=PhonesTest)$class 
head(prediction)
## [1] Expensive Expensive Expensive Expensive Expensive Expensive
## Levels: Cheap Expensive

Performance

The confusion matrix: predictions in rows, true values in columns (but we can change the order)

conf_lda_matrix = confusionMatrix(prediction, PhonesTest$PriceClass)$table
conf_lda_matrix
##            Reference
## Prediction  Cheap Expensive
##   Cheap       145        30
##   Expensive    24        72
confusionMatrix(prediction, PhonesTest$PriceClass)$overall[1]
## Accuracy 
## 0.800738

2.2.2 QDA

#qda.model1 <- qda(PriceClass ~ ., data=PhonesTrain, prior = c(3/5, 2/5))
#qda.model1
#qda.model2 <- qda(PriceClass ~ ., data=PhonesTest, prior = c(3/5, 2/5))
#qda.model2

Performance:

#prediction = predict(qda.model2, newdata=PhonesTest)$class 
#confusionMatrix(prediction, PhonesTest$PriceClass)$table
#confusionMatrix(prediction, PhonesTest$PriceClass)$overall[1]

2.2.3 Benchmark Model

We have many predictors, hence our benchmark will be the penalized logistic regression

ctrl <- trainControl(method = "cv", number = 5,                      
                     classProbs = TRUE,                       
                     verboseIter=T)  
# We have many predictors, hence use penalized logistic regression 
lrFit <- train(PriceClass ~ .,                 
               method = "glmnet",               
               tuneGrid = expand.grid(alpha = seq(0, 1, 0.1), 
                                      lambda = seq(0, .1, 0.02)),     
               metric = "Kappa",         
               data = PhonesTrain,     
               preProcess = c("center", "scale"), 
               trControl = ctrl) 
## + Fold1: alpha=0.0, lambda=0.1 
## - Fold1: alpha=0.0, lambda=0.1 
## + Fold1: alpha=0.1, lambda=0.1 
## - Fold1: alpha=0.1, lambda=0.1 
## + Fold1: alpha=0.2, lambda=0.1 
## - Fold1: alpha=0.2, lambda=0.1 
## + Fold1: alpha=0.3, lambda=0.1 
## - Fold1: alpha=0.3, lambda=0.1 
## + Fold1: alpha=0.4, lambda=0.1 
## - Fold1: alpha=0.4, lambda=0.1 
## + Fold1: alpha=0.5, lambda=0.1 
## - Fold1: alpha=0.5, lambda=0.1 
## + Fold1: alpha=0.6, lambda=0.1 
## - Fold1: alpha=0.6, lambda=0.1 
## + Fold1: alpha=0.7, lambda=0.1 
## - Fold1: alpha=0.7, lambda=0.1 
## + Fold1: alpha=0.8, lambda=0.1 
## - Fold1: alpha=0.8, lambda=0.1 
## + Fold1: alpha=0.9, lambda=0.1 
## - Fold1: alpha=0.9, lambda=0.1 
## + Fold1: alpha=1.0, lambda=0.1 
## - Fold1: alpha=1.0, lambda=0.1 
## + Fold2: alpha=0.0, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.0, lambda=0.1 
## + Fold2: alpha=0.1, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.1, lambda=0.1 
## + Fold2: alpha=0.2, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.2, lambda=0.1 
## + Fold2: alpha=0.3, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.3, lambda=0.1 
## + Fold2: alpha=0.4, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.4, lambda=0.1 
## + Fold2: alpha=0.5, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.5, lambda=0.1 
## + Fold2: alpha=0.6, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.6, lambda=0.1 
## + Fold2: alpha=0.7, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.7, lambda=0.1 
## + Fold2: alpha=0.8, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.8, lambda=0.1 
## + Fold2: alpha=0.9, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.9, lambda=0.1 
## + Fold2: alpha=1.0, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=1.0, lambda=0.1 
## + Fold3: alpha=0.0, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.0, lambda=0.1 
## + Fold3: alpha=0.1, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.1, lambda=0.1 
## + Fold3: alpha=0.2, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.2, lambda=0.1 
## + Fold3: alpha=0.3, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.3, lambda=0.1 
## + Fold3: alpha=0.4, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.4, lambda=0.1 
## + Fold3: alpha=0.5, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.5, lambda=0.1 
## + Fold3: alpha=0.6, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.6, lambda=0.1 
## + Fold3: alpha=0.7, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.7, lambda=0.1 
## + Fold3: alpha=0.8, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.8, lambda=0.1 
## + Fold3: alpha=0.9, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.9, lambda=0.1 
## + Fold3: alpha=1.0, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=1.0, lambda=0.1 
## + Fold4: alpha=0.0, lambda=0.1 
## - Fold4: alpha=0.0, lambda=0.1 
## + Fold4: alpha=0.1, lambda=0.1 
## - Fold4: alpha=0.1, lambda=0.1 
## + Fold4: alpha=0.2, lambda=0.1 
## - Fold4: alpha=0.2, lambda=0.1 
## + Fold4: alpha=0.3, lambda=0.1 
## - Fold4: alpha=0.3, lambda=0.1 
## + Fold4: alpha=0.4, lambda=0.1 
## - Fold4: alpha=0.4, lambda=0.1 
## + Fold4: alpha=0.5, lambda=0.1 
## - Fold4: alpha=0.5, lambda=0.1 
## + Fold4: alpha=0.6, lambda=0.1 
## - Fold4: alpha=0.6, lambda=0.1 
## + Fold4: alpha=0.7, lambda=0.1 
## - Fold4: alpha=0.7, lambda=0.1 
## + Fold4: alpha=0.8, lambda=0.1 
## - Fold4: alpha=0.8, lambda=0.1 
## + Fold4: alpha=0.9, lambda=0.1 
## - Fold4: alpha=0.9, lambda=0.1 
## + Fold4: alpha=1.0, lambda=0.1 
## - Fold4: alpha=1.0, lambda=0.1 
## + Fold5: alpha=0.0, lambda=0.1 
## - Fold5: alpha=0.0, lambda=0.1 
## + Fold5: alpha=0.1, lambda=0.1 
## - Fold5: alpha=0.1, lambda=0.1 
## + Fold5: alpha=0.2, lambda=0.1 
## - Fold5: alpha=0.2, lambda=0.1 
## + Fold5: alpha=0.3, lambda=0.1 
## - Fold5: alpha=0.3, lambda=0.1 
## + Fold5: alpha=0.4, lambda=0.1 
## - Fold5: alpha=0.4, lambda=0.1 
## + Fold5: alpha=0.5, lambda=0.1 
## - Fold5: alpha=0.5, lambda=0.1 
## + Fold5: alpha=0.6, lambda=0.1 
## - Fold5: alpha=0.6, lambda=0.1 
## + Fold5: alpha=0.7, lambda=0.1 
## - Fold5: alpha=0.7, lambda=0.1 
## + Fold5: alpha=0.8, lambda=0.1 
## - Fold5: alpha=0.8, lambda=0.1 
## + Fold5: alpha=0.9, lambda=0.1 
## - Fold5: alpha=0.9, lambda=0.1 
## + Fold5: alpha=1.0, lambda=0.1 
## - Fold5: alpha=1.0, lambda=0.1 
## Aggregating results
## Selecting tuning parameters
## Fitting alpha = 0.9, lambda = 0 on full training set
print(lrFit) 
## glmnet 
## 
## 1088 samples
##   17 predictor
##    2 classes: 'Cheap', 'Expensive' 
## 
## Pre-processing: centered (23), scaled (23) 
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 870, 870, 870, 871, 871 
## Resampling results across tuning parameters:
## 
##   alpha  lambda  Accuracy   Kappa    
##   0.0    0.00    0.8170887  0.6019786
##   0.0    0.02    0.8170887  0.6019786
##   0.0    0.04    0.8170972  0.6023910
##   0.0    0.06    0.8143449  0.5957725
##   0.0    0.08    0.8115926  0.5891784
##   0.0    0.10    0.8125100  0.5913735
##   0.1    0.00    0.8244366  0.6170251
##   0.1    0.02    0.8180104  0.6037317
##   0.1    0.04    0.8161882  0.5997617
##   0.1    0.06    0.8161924  0.5989857
##   0.1    0.08    0.8171141  0.6007498
##   0.1    0.10    0.8171057  0.6007514
##   0.2    0.00    0.8244366  0.6170251
##   0.2    0.02    0.8207669  0.6099444
##   0.2    0.04    0.8171057  0.6015591
##   0.2    0.06    0.8207838  0.6095693
##   0.2    0.08    0.8161798  0.5985819
##   0.2    0.10    0.8189236  0.6031810
##   0.3    0.00    0.8244366  0.6170251
##   0.3    0.02    0.8216886  0.6117504
##   0.3    0.04    0.8198579  0.6073736
##   0.3    0.06    0.8189321  0.6047862
##   0.3    0.08    0.8161671  0.5968919
##   0.3    0.10    0.8161713  0.5952188
##   0.4    0.00    0.8244366  0.6170251
##   0.4    0.02    0.8189321  0.6052241
##   0.4    0.04    0.8216928  0.6110114
##   0.4    0.06    0.8198495  0.6062537
##   0.4    0.08    0.8161713  0.5952188
##   0.4    0.10    0.8106498  0.5810254
##   0.5    0.00    0.8244366  0.6170251
##   0.5    0.02    0.8198537  0.6066710
##   0.5    0.04    0.8189321  0.6040335
##   0.5    0.06    0.8198495  0.6050769
##   0.5    0.08    0.8143322  0.5905050
##   0.5    0.10    0.8060542  0.5688001
##   0.6    0.00    0.8244366  0.6170251
##   0.6    0.02    0.8207711  0.6088569
##   0.6    0.04    0.8180146  0.6022325
##   0.6    0.06    0.8152581  0.5939561
##   0.6    0.08    0.8088234  0.5769884
##   0.6    0.10    0.7996322  0.5540875
##   0.7    0.00    0.8244366  0.6170251
##   0.7    0.02    0.8216886  0.6107029
##   0.7    0.04    0.8189363  0.6041125
##   0.7    0.06    0.8161713  0.5953406
##   0.7    0.08    0.8023929  0.5612103
##   0.7    0.10    0.8014670  0.5573539
##   0.8    0.00    0.8244366  0.6170251
##   0.8    0.02    0.8216886  0.6107029
##   0.8    0.04    0.8180104  0.6014300
##   0.8    0.06    0.8088192  0.5784075
##   0.8    0.08    0.7996322  0.5540875
##   0.8    0.10    0.7977889  0.5477813
##   0.9    0.00    0.8253541  0.6188414
##   0.9    0.02    0.8207669  0.6088619
##   0.9    0.04    0.8161755  0.5970105
##   0.9    0.06    0.8042193  0.5669729
##   0.9    0.08    0.7977889  0.5482597
##   0.9    0.10    0.7959455  0.5427031
##   1.0    0.00    0.8253541  0.6188414
##   1.0    0.02    0.8226060  0.6132162
##   1.0    0.04    0.8143322  0.5929705
##   1.0    0.06    0.8005496  0.5580273
##   1.0    0.08    0.7959455  0.5436850
##   1.0    0.10    0.7950281  0.5404026
## 
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.9 and lambda = 0.
lrPred = predict(lrFit, PhonesTest) 
confusionMatrix(lrPred, PhonesTest$PriceClass)
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Cheap Expensive
##   Cheap       148        30
##   Expensive    21        72
##                                           
##                Accuracy : 0.8118          
##                  95% CI : (0.7601, 0.8566)
##     No Information Rate : 0.6236          
##     P-Value [Acc > NIR] : 1.418e-11       
##                                           
##                   Kappa : 0.592           
##                                           
##  Mcnemar's Test P-Value : 0.2626          
##                                           
##             Sensitivity : 0.8757          
##             Specificity : 0.7059          
##          Pos Pred Value : 0.8315          
##          Neg Pred Value : 0.7742          
##              Prevalence : 0.6236          
##          Detection Rate : 0.5461          
##    Detection Prevalence : 0.6568          
##       Balanced Accuracy : 0.7908          
##                                           
##        'Positive' Class : Cheap           
## 

The accuracy obtained is around 81% and Kappa around 0.56, not bad but should be improved.

2.2.4 Naïve Bayes

A technology resale company needs to know if buying a second hand phone would be or not profitable. Therefore, we need to set some guidelines in order to be able to determine in the most optimal way the price of a phone without knowing what a person would actually pay for it.

We decided to assume the following costs of each possible outcome:

  • Cost of true cheap phones is 0: The company buys the phone and sells it to the price we estimated.

  • Cost of false expensive is 70: The company sells the phone cheap when people would pay for it even if it was expensive.

  • Cost of false cheap is 200: (The most problematic error) The company pays a lot for a phone that is not sold unless it was cheaper.

  • Cost of true expensives is 0: The company buys the phone and sells it to the price we estimated.

Cost matrix:

Prediction/Reality Cheap Expensive
Cheap 0 70
Expensive 200 0

Unit cost is then:

0*TN + 70*FP + 200*FN + 0*TP

# Type the unit cost here:
cost.unit <- c(0, 70, 200, 0)

Therefore, the unit cost for Naive classifier (no analytics knowledge) would be:

cost = 0*0.62 + 200*0 + 70*0.38 + + 0*0 = 27 eur/phone on average

However, lets study if we can reduce this cost:

Let’s use the threshold from the ROC curve, which was 0.05

threshold = 0.05
lrProb = predict(lrFit, PhonesTest, type="prob")
lrPred = rep("Cheap", nrow(PhonesTest))
lrPred[which(lrProb[,2] > threshold)] = "Expensive"
confusionMatrix(factor(lrPred), PhonesTest$PriceClass)
## Confusion Matrix and Statistics
## 
##            Reference
## Prediction  Cheap Expensive
##   Cheap        42         0
##   Expensive   127       102
##                                        
##                Accuracy : 0.5314       
##                  95% CI : (0.47, 0.592)
##     No Information Rate : 0.6236       
##     P-Value [Acc > NIR] : 0.9992       
##                                        
##                   Kappa : 0.1993       
##                                        
##  Mcnemar's Test P-Value : <2e-16       
##                                        
##             Sensitivity : 0.2485       
##             Specificity : 1.0000       
##          Pos Pred Value : 1.0000       
##          Neg Pred Value : 0.4454       
##              Prevalence : 0.6236       
##          Detection Rate : 0.1550       
##    Detection Prevalence : 0.1550       
##       Balanced Accuracy : 0.6243       
##                                        
##        'Positive' Class : Cheap        
## 

Now we compute the cost per phone

CM = confusionMatrix(factor(lrPred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 32.80443

The cost per unit obtained is 33 eur, greater than the naive one. This tells us that even if the ROC curve’s gave us a threshold, this does not mean that is the best option for our company. Furthermore, we must find the threshold value that optimizes the cost per phone, so it is as low as possible.

We tried the raw models with a fixed threshold to see our starting point:

paste0("Logistic Regression model costs: ",sum(as.vector(conf_log_reg$table)*cost.unit)/sum(conf_log_reg$table))
## [1] "Logistic Regression model costs: 26.8265682656827"
paste0("Penalized Logistic Regression model costs: ",sum(as.vector(conf_p_log_reg$table)*cost.unit)/sum(conf_p_log_reg$table))
## [1] "Penalized Logistic Regression model costs: 27.1599264705882"
paste0("ROC curve costs: ", cost)
## [1] "ROC curve costs: 32.8044280442804"
paste0("LDA model costs: ",sum(as.vector(conf_lda_matrix)*cost.unit)/sum(conf_lda_matrix))
## [1] "LDA model costs: 28.3394833948339"

The best one was the logistic regression model, with 26.82 eur/phone cost. Let us try to reduce this value by optimizing the threshold:

2.2.4.1 Cost-sensitive classifier

However, the cost we obtained is only with a fixed threshold, so if this threshold is optimized we can obtain then the best logistic regression model for our prediction:

cost.i = matrix(NA, nrow = 100, ncol = 10)
# 20 replicates for training/testing sets for each of the 10 values of threshold

j <- 0
for (threshold in seq(0.05,0.5,0.05)){
  
  j <- j + 1
  
  cat(j)
  for(i in 1:100){
    
    # partition data intro training (80%) and testing sets (20%)
    d <- createDataPartition(PhonesTrain$PriceClass, p = 0.8, list = FALSE)
    # select training sample
    
    train <- PhonesTrain[d,]
    test  <- PhonesTrain[-d,]  

    lrFit <- train(PriceClass ~ ., data=train, method = "glmnet",
                   tuneGrid = data.frame(alpha = 0.3, lambda = 0),
                   preProcess = c("center",
                                  "scale"),
                   trControl = trainControl(method = "none", classProbs = TRUE))
    
    lrProb = predict(lrFit, test, type="prob")
    lrPred = rep("Cheap", nrow(test))
    lrPred[which(lrProb[,2] > threshold)] = "Expensive"
    
    CM = confusionMatrix(factor(lrPred), test$PriceClass)$table
    cost = sum(as.vector(CM)*cost.unit)/sum(CM)
    cost
    
    cost.i[i,j] <- cost
    
  }
}
## 1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## 2
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## 3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## 4
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen, Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen, Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## 5
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## 6
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## 7
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## 8
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemBlackBerry,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## 9
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## 10
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
# Threshold optimization:
boxplot(cost.i, main = "Threshold selection",
        ylab = "unit cost",
        xlab = "threshold value",
        names = seq(0.05,0.5,0.05),col="royalblue2",las=2)

# values around 0.2 are reasonable
apply(cost.i, 2, median)
##  [1] 33.50230 26.61290 22.69585 22.94931 22.48848 23.15668 24.10138 25.89862
##  [9] 25.69124 27.05069

We can see that the best threshold value is 0.25, which has a mean cost of 22 eur/phone. We can see that we have reduced the cost that we got with the raw classifiers.

The final prediction using this threshold is:

threshold = 0.25
lrFit <- train(PriceClass ~ ., data=PhonesTrain, method = "glmnet",
               tuneGrid = data.frame(alpha = 0.3, lambda = 0), preProcess = c("center", "scale"),
               trControl = trainControl(method = "none", classProbs = TRUE))
lrProb = predict(lrFit, PhonesTest, type="prob")
lrPred = rep("Cheap", nrow(PhonesTest))
lrPred[which(lrProb[,2] > threshold)] = "Expensive"
CM = confusionMatrix(factor(lrPred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 18.45018

We obtained a cost of 18.5 eur/phone. This is the lowest cost obtained so far, so we can set the threshold in 0.25.

2.3 Machine Learning Tools

2.3.1 Decision Trees

library(rpart)

# Hyper-parameters
control = rpart.control(minsplit = 30, maxdepth = 10, cp=0.01)

# minsplit: minimum number of observations in a node before before a split
# maxdepth: maximum depth of any node of the final tree
# cp: degree of complexity, the smaller the more branches

A decision tree

model = PriceClass ~.
dtFit <- rpart(model, data=PhonesTrain, method = "class", control = control)
summary(dtFit)
## Call:
## rpart(formula = model, data = PhonesTrain, method = "class", 
##     control = control)
##   n= 1088 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.48418491      0 1.0000000 1.0000000 0.03890980
## 2 0.01703163      1 0.5158151 0.5279805 0.03206880
## 3 0.01459854      3 0.4817518 0.5255474 0.03201318
## 4 0.01000000      4 0.4671533 0.5085158 0.03161632
## 
## Variable importance
##           Resolution.y           Resolution.x               RAM..MB. 
##                     22                     21                     13 
##  Internal.storage..GB.              Processor            Rear.camera 
##                     13                     12                     12 
##   Screen.size..inches. Battery.capacity..mAh.       Operating.system 
##                      3                      2                      1 
## 
## Node number 1: 1088 observations,    complexity param=0.4841849
##   predicted class=Cheap      expected loss=0.3777574  P(node) =1
##     class counts:   677   411
##    probabilities: 0.622 0.378 
##   left son=2 (717 obs) right son=3 (371 obs)
##   Primary splits:
##       Resolution.y          < 1532  to the left,  improve=171.6386, (0 missing)
##       Resolution.x          < 735   to the left,  improve=168.8961, (0 missing)
##       RAM..MB.              < 3500  to the left,  improve=150.6035, (0 missing)
##       Internal.storage..GB. < 24    to the left,  improve=147.3806, (0 missing)
##       Rear.camera           < 8.35  to the left,  improve=127.4458, (0 missing)
##   Surrogate splits:
##       Resolution.x          < 1052  to the left,  agree=0.978, adj=0.935, (0 split)
##       RAM..MB.              < 2500  to the left,  agree=0.835, adj=0.515, (0 split)
##       Internal.storage..GB. < 24    to the left,  agree=0.830, adj=0.501, (0 split)
##       Processor             < 5     to the left,  agree=0.826, adj=0.491, (0 split)
##       Rear.camera           < 13.05 to the left,  agree=0.803, adj=0.423, (0 split)
## 
## Node number 2: 717 observations,    complexity param=0.01703163
##   predicted class=Cheap      expected loss=0.1757322  P(node) =0.6590074
##     class counts:   591   126
##    probabilities: 0.824 0.176 
##   left son=4 (449 obs) right son=5 (268 obs)
##   Primary splits:
##       Rear.camera           < 8.35  to the left,  improve=19.93860, (0 missing)
##       Processor             < 5     to the left,  improve=14.56142, (0 missing)
##       Resolution.x          < 510   to the left,  improve=13.42088, (0 missing)
##       Internal.storage..GB. < 24    to the left,  improve=13.17083, (0 missing)
##       RAM..MB.              < 3500  to the left,  improve=10.42950, (0 missing)
##   Surrogate splits:
##       RAM..MB.               < 1500  to the left,  agree=0.805, adj=0.478, (0 split)
##       Screen.size..inches.   < 5.1   to the left,  agree=0.796, adj=0.455, (0 split)
##       Processor              < 5     to the left,  agree=0.784, adj=0.422, (0 split)
##       Internal.storage..GB.  < 24    to the left,  agree=0.775, adj=0.399, (0 split)
##       Battery.capacity..mAh. < 2915  to the left,  agree=0.762, adj=0.362, (0 split)
## 
## Node number 3: 371 observations
##   predicted class=Expensive  expected loss=0.2318059  P(node) =0.3409926
##     class counts:    86   285
##    probabilities: 0.232 0.768 
## 
## Node number 4: 449 observations
##   predicted class=Cheap      expected loss=0.08463252  P(node) =0.4126838
##     class counts:   411    38
##    probabilities: 0.915 0.085 
## 
## Node number 5: 268 observations,    complexity param=0.01703163
##   predicted class=Cheap      expected loss=0.3283582  P(node) =0.2463235
##     class counts:   180    88
##    probabilities: 0.672 0.328 
##   left son=10 (254 obs) right son=11 (14 obs)
##   Primary splits:
##       Screen.size..inches.  < 4.815 to the right, improve=13.327070, (0 missing)
##       Front.camera          < 3.1   to the right, improve= 9.208692, (0 missing)
##       RAM..MB.              < 3500  to the left,  improve= 4.744238, (0 missing)
##       Rear.camera           < 13.1  to the left,  improve= 4.400181, (0 missing)
##       Internal.storage..GB. < 48    to the left,  improve= 3.304299, (0 missing)
##   Surrogate splits:
##       Battery.capacity..mAh. < 1980  to the right, agree=0.966, adj=0.357, (0 split)
##       Resolution.x           < 735   to the left,  agree=0.966, adj=0.357, (0 split)
##       Operating.system       splits as  L-LR--R,   agree=0.966, adj=0.357, (0 split)
##       Processor              < 3     to the right, agree=0.963, adj=0.286, (0 split)
##       Front.camera           < 1.95  to the right, agree=0.959, adj=0.214, (0 split)
## 
## Node number 10: 254 observations,    complexity param=0.01459854
##   predicted class=Cheap      expected loss=0.2913386  P(node) =0.2334559
##     class counts:   180    74
##    probabilities: 0.709 0.291 
##   left son=20 (228 obs) right son=21 (26 obs)
##   Primary splits:
##       RAM..MB.              < 3500  to the left,  improve=6.082969, (0 missing)
##       Processor             < 6     to the left,  improve=4.443180, (0 missing)
##       Internal.storage..GB. < 48    to the left,  improve=3.591884, (0 missing)
##       Screen.size..inches.  < 5.475 to the left,  improve=2.791044, (0 missing)
##       Front.camera          < 14.5  to the left,  improve=2.184549, (0 missing)
##   Surrogate splits:
##       Internal.storage..GB. < 48    to the left,  agree=0.972, adj=0.731, (0 split)
##       Rear.camera           < 14.6  to the left,  agree=0.917, adj=0.192, (0 split)
##       Front.camera          < 18    to the left,  agree=0.909, adj=0.115, (0 split)
## 
## Node number 11: 14 observations
##   predicted class=Expensive  expected loss=0  P(node) =0.01286765
##     class counts:     0    14
##    probabilities: 0.000 1.000 
## 
## Node number 20: 228 observations
##   predicted class=Cheap      expected loss=0.254386  P(node) =0.2095588
##     class counts:   170    58
##    probabilities: 0.746 0.254 
## 
## Node number 21: 26 observations
##   predicted class=Expensive  expected loss=0.3846154  P(node) =0.02389706
##     class counts:    10    16
##    probabilities: 0.385 0.615
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 4.3.2
rpart.plot(dtFit, digits=3)

To create a full tree, we can set the complexity parameter cp to 0 (split even if it does not improve the tree) and we set the minimum number of observations in a node needed to split to the smallest value of 2

control = rpart.control(minsplit = 40, maxdepth = 12, cp=0.001)
dtFit <- rpart(model, data=PhonesTrain, method = "class", control = control)

rpart.plot(dtFit, digits = 3)

Prediction:

dtPred <- predict(dtFit, PhonesTest, type = "class")

dtProb <- predict(dtFit, PhonesTest, type = "prob")
threshold = 0.3
dtPred = rep("Cheap", nrow(PhonesTest))
dtPred[which(dtProb[,2] > threshold)] = "Expensive"
CM = confusionMatrix(factor(dtPred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 18.81919

By using the decission tree algorithm we obtain a cost of 20.5 eur/phone. However, the cost obtained with the cost-sensitive classifier is better (18.5 eur/phone).

Now using Caret, we have:

library(caret)  
caret.fit <- train(model, data = PhonesTrain,
                   method = "rpart",
                   control=rpart.control(minsplit = 40, maxdepth = 12),
                   trControl = trainControl(method = "cv", number = 5), 
                   tuneLength=10) 
# caret.fit

Visualization

rpart.plot(caret.fit$finalModel) 

Prediction

dtProb <- predict(caret.fit, PhonesTest, type = "prob") 
threshold = 0.3
dtPred = rep("Cheap", nrow(PhonesTest)) 
dtPred[which(dtProb[,2] > threshold)] = "Expensive" 
CM = confusionMatrix(factor(dtPred), PhonesTest$PriceClass)$table 
cost = sum(as.vector(CM)*cost.unit)/sum(CM) 
cost
## [1] 18.85609

We obtained 20.50 eur/phone again, there was no improvement in this method.

Lets try now the Random Forest.

2.3.2 Random Forests

rf.train <- randomForest(PriceClass ~., data=PhonesTrain, 
                         ntree=200,
                         mtry=10,
                         cutoff=c(0.75,0.25),
                         importance=TRUE,
                         do.trace=T)
## ntree      OOB      1      2
##     1:  20.80% 15.20% 30.20%
##     2:  22.78% 20.80% 25.88%
##     3:  21.92% 22.75% 20.58%
##     4:  21.68% 23.52% 18.70%
##     5:  22.87% 24.92% 19.52%
##     6:  23.75% 26.84% 18.65%
##     7:  23.35% 26.31% 18.48%
##     8:  23.89% 27.85% 17.49%
##     9:  24.49% 28.81% 17.44%
##    10:  24.17% 28.21% 17.56%
##    11:  23.68% 28.02% 16.59%
##    12:  23.27% 27.64% 16.10%
##    13:  22.58% 27.60% 14.36%
##    14:  23.13% 27.60% 15.82%
##    15:  22.72% 27.51% 14.84%
##    16:  22.54% 27.22% 14.84%
##    17:  23.00% 27.66% 15.33%
##    18:  22.72% 27.22% 15.33%
##    19:  23.90% 28.80% 15.82%
##    20:  23.53% 27.92% 16.30%
##    21:  23.44% 28.06% 15.82%
##    22:  23.71% 28.21% 16.30%
##    23:  24.08% 28.80% 16.30%
##    24:  24.36% 29.10% 16.55%
##    25:  23.99% 28.66% 16.30%
##    26:  23.99% 28.95% 15.82%
##    27:  23.44% 28.36% 15.33%
##    28:  23.53% 28.51% 15.33%
##    29:  23.62% 28.66% 15.33%
##    30:  22.98% 27.77% 15.09%
##    31:  23.25% 27.92% 15.57%
##    32:  23.53% 27.92% 16.30%
##    33:  23.44% 27.92% 16.06%
##    34:  23.62% 28.06% 16.30%
##    35:  23.25% 27.47% 16.30%
##    36:  23.53% 28.06% 16.06%
##    37:  23.62% 28.21% 16.06%
##    38:  23.25% 27.92% 15.57%
##    39:  22.98% 27.62% 15.33%
##    40:  23.16% 27.92% 15.33%
##    41:  22.98% 27.62% 15.33%
##    42:  22.61% 27.47% 14.60%
##    43:  22.70% 27.62% 14.60%
##    44:  22.43% 27.77% 13.63%
##    45:  22.52% 27.47% 14.36%
##    46:  22.43% 27.62% 13.87%
##    47:  22.52% 27.62% 14.11%
##    48:  22.15% 26.88% 14.36%
##    49:  22.33% 27.18% 14.36%
##    50:  22.24% 26.59% 15.09%
##    51:  22.15% 26.88% 14.36%
##    52:  22.15% 26.88% 14.36%
##    53:  22.06% 26.88% 14.11%
##    54:  21.97% 27.03% 13.63%
##    55:  21.69% 26.88% 13.14%
##    56:  21.42% 26.44% 13.14%
##    57:  21.78% 27.03% 13.14%
##    58:  21.97% 27.18% 13.38%
##    59:  22.06% 27.33% 13.38%
##    60:  22.06% 26.88% 14.11%
##    61:  21.78% 26.74% 13.63%
##    62:  21.97% 27.03% 13.63%
##    63:  21.88% 26.88% 13.63%
##    64:  21.88% 26.59% 14.11%
##    65:  22.61% 27.62% 14.36%
##    66:  22.52% 27.62% 14.11%
##    67:  22.61% 27.77% 14.11%
##    68:  22.61% 27.77% 14.11%
##    69:  22.79% 27.92% 14.36%
##    70:  22.79% 27.62% 14.84%
##    71:  22.89% 27.92% 14.60%
##    72:  22.89% 27.77% 14.84%
##    73:  22.70% 27.33% 15.09%
##    74:  22.98% 27.77% 15.09%
##    75:  22.89% 27.92% 14.60%
##    76:  22.43% 27.62% 13.87%
##    77:  22.89% 27.92% 14.60%
##    78:  22.61% 27.77% 14.11%
##    79:  22.43% 27.33% 14.36%
##    80:  22.43% 27.33% 14.36%
##    81:  22.43% 27.33% 14.36%
##    82:  22.52% 27.62% 14.11%
##    83:  22.43% 27.18% 14.60%
##    84:  22.24% 27.03% 14.36%
##    85:  22.33% 27.18% 14.36%
##    86:  22.52% 27.62% 14.11%
##    87:  22.43% 27.47% 14.11%
##    88:  22.79% 27.77% 14.60%
##    89:  22.24% 27.18% 14.11%
##    90:  22.89% 27.77% 14.84%
##    91:  22.98% 27.92% 14.84%
##    92:  22.89% 27.92% 14.60%
##    93:  22.70% 27.47% 14.84%
##    94:  22.89% 27.77% 14.84%
##    95:  22.79% 27.62% 14.84%
##    96:  22.79% 27.77% 14.60%
##    97:  23.25% 28.51% 14.60%
##    98:  23.35% 28.51% 14.84%
##    99:  23.16% 28.36% 14.60%
##   100:  23.07% 28.21% 14.60%
##   101:  23.07% 28.21% 14.60%
##   102:  22.98% 28.06% 14.60%
##   103:  22.79% 27.92% 14.36%
##   104:  22.98% 27.92% 14.84%
##   105:  23.16% 28.36% 14.60%
##   106:  22.98% 28.06% 14.60%
##   107:  23.07% 28.21% 14.60%
##   108:  23.16% 28.36% 14.60%
##   109:  23.07% 28.21% 14.60%
##   110:  23.07% 28.21% 14.60%
##   111:  22.70% 27.77% 14.36%
##   112:  22.70% 27.77% 14.36%
##   113:  23.07% 28.36% 14.36%
##   114:  23.07% 28.36% 14.36%
##   115:  22.89% 28.06% 14.36%
##   116:  22.79% 27.92% 14.36%
##   117:  22.89% 28.06% 14.36%
##   118:  22.89% 28.21% 14.11%
##   119:  23.25% 28.80% 14.11%
##   120:  22.98% 28.36% 14.11%
##   121:  23.25% 28.66% 14.36%
##   122:  23.16% 28.66% 14.11%
##   123:  22.98% 28.36% 14.11%
##   124:  22.70% 28.06% 13.87%
##   125:  22.70% 28.06% 13.87%
##   126:  22.52% 27.77% 13.87%
##   127:  22.52% 27.77% 13.87%
##   128:  22.43% 27.62% 13.87%
##   129:  22.70% 28.06% 13.87%
##   130:  22.43% 27.47% 14.11%
##   131:  22.33% 27.33% 14.11%
##   132:  22.24% 27.33% 13.87%
##   133:  22.43% 27.62% 13.87%
##   134:  22.52% 27.62% 14.11%
##   135:  22.79% 27.92% 14.36%
##   136:  22.89% 28.21% 14.11%
##   137:  22.89% 28.21% 14.11%
##   138:  22.89% 28.06% 14.36%
##   139:  22.89% 27.92% 14.60%
##   140:  22.70% 27.77% 14.36%
##   141:  22.52% 27.77% 13.87%
##   142:  22.89% 28.36% 13.87%
##   143:  22.70% 28.06% 13.87%
##   144:  22.70% 28.06% 13.87%
##   145:  22.52% 27.77% 13.87%
##   146:  22.61% 27.92% 13.87%
##   147:  22.70% 28.06% 13.87%
##   148:  22.70% 28.06% 13.87%
##   149:  22.61% 27.92% 13.87%
##   150:  22.61% 27.92% 13.87%
##   151:  22.61% 27.92% 13.87%
##   152:  22.61% 28.06% 13.63%
##   153:  22.61% 28.06% 13.63%
##   154:  22.79% 28.21% 13.87%
##   155:  22.89% 28.21% 14.11%
##   156:  22.61% 27.77% 14.11%
##   157:  22.52% 27.77% 13.87%
##   158:  22.52% 27.62% 14.11%
##   159:  22.43% 27.47% 14.11%
##   160:  22.43% 27.47% 14.11%
##   161:  22.52% 27.62% 14.11%
##   162:  22.52% 27.62% 14.11%
##   163:  22.52% 27.62% 14.11%
##   164:  22.52% 27.77% 13.87%
##   165:  22.43% 27.62% 13.87%
##   166:  22.43% 27.77% 13.63%
##   167:  22.33% 27.47% 13.87%
##   168:  22.33% 27.33% 14.11%
##   169:  22.61% 27.77% 14.11%
##   170:  22.52% 27.62% 14.11%
##   171:  22.52% 27.77% 13.87%
##   172:  22.61% 27.77% 14.11%
##   173:  22.33% 27.47% 13.87%
##   174:  22.52% 27.62% 14.11%
##   175:  22.52% 27.77% 13.87%
##   176:  22.24% 27.33% 13.87%
##   177:  22.24% 27.33% 13.87%
##   178:  22.15% 27.18% 13.87%
##   179:  22.15% 27.18% 13.87%
##   180:  22.33% 27.47% 13.87%
##   181:  22.33% 27.47% 13.87%
##   182:  22.24% 27.33% 13.87%
##   183:  22.52% 27.62% 14.11%
##   184:  22.33% 27.33% 14.11%
##   185:  22.24% 27.18% 14.11%
##   186:  22.43% 27.47% 14.11%
##   187:  22.33% 27.47% 13.87%
##   188:  22.61% 27.77% 14.11%
##   189:  22.52% 27.62% 14.11%
##   190:  22.43% 27.62% 13.87%
##   191:  22.43% 27.47% 14.11%
##   192:  22.43% 27.62% 13.87%
##   193:  22.52% 27.77% 13.87%
##   194:  22.52% 27.77% 13.87%
##   195:  22.43% 27.62% 13.87%
##   196:  22.43% 27.62% 13.87%
##   197:  22.52% 27.62% 14.11%
##   198:  22.61% 27.77% 14.11%
##   199:  22.52% 27.62% 14.11%
##   200:  22.52% 27.62% 14.11%
# mtry: number of variables randomly sampled as candidates at each split
# ntree: number of trees to grow
# cutoff: cutoff probabilities in majority vote

Prediction

rf.pred <- predict(rf.train, newdata=PhonesTest)
CM = confusionMatrix(factor(rf.pred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 19.5941

In this case, the cost obtained is 20 eur/phone. We improved the previous cost but it is not enough, since the classifier is still the best.

Now we are going to use caret to try to improve the random forest:

We define the specific function for the cost:

EconomicCost <- function(data, lev = NULL, model = NULL)  {   
  y.pred = data$pred    
  y.true = data$obs   
  CM = confusionMatrix(y.pred, y.true)$table   
  out = sum(as.vector(CM)*cost.unit)/sum(CM)   
  names(out) <- c("EconomicCost")   
  out 
  }

Now include this function in the Caret control:

ctrl <- trainControl(method = "cv", 
                     number = 5,                      
                     classProbs = TRUE,                       
                     summaryFunction = EconomicCost,                  
                     verboseIter=T)

Now train a RF using Caret with the specific metric:

rf.train <- train(PriceClass ~.,  
                  method = "rf",       
                  data = PhonesTrain,  
                  preProcess = c("center", "scale"),   
                  ntree = 200,           
                  cutoff=c(0.7,0.3),        
                  tuneGrid = expand.grid(mtry=c(6,8,10)), 
                  metric = "EconomicCost",     
                  maximize = F,          
                  trControl = ctrl)
## + Fold1: mtry= 6
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold1: mtry= 6 
## + Fold1: mtry= 8
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold1: mtry= 8 
## + Fold1: mtry=10
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold1: mtry=10 
## + Fold2: mtry= 6
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: mtry= 6 
## + Fold2: mtry= 8
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: mtry= 8 
## + Fold2: mtry=10
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: mtry=10 
## + Fold3: mtry= 6 
## - Fold3: mtry= 6 
## + Fold3: mtry= 8 
## - Fold3: mtry= 8 
## + Fold3: mtry=10 
## - Fold3: mtry=10 
## + Fold4: mtry= 6 
## - Fold4: mtry= 6 
## + Fold4: mtry= 8 
## - Fold4: mtry= 8 
## + Fold4: mtry=10 
## - Fold4: mtry=10 
## + Fold5: mtry= 6 
## - Fold5: mtry= 6 
## + Fold5: mtry= 8 
## - Fold5: mtry= 8 
## + Fold5: mtry=10 
## - Fold5: mtry=10 
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 10 on full training set

Variable importance:

rf_imp <- varImp(rf.train, scale = F) 
plot(rf_imp, scales = list(y = list(cex = .95)))

Prediction:

rfPred = predict(rf.train, newdata=PhonesTest) 
CM = confusionMatrix(factor(rfPred), PhonesTest$PriceClass)$table 
cost = sum(as.vector(CM)*cost.unit)/sum(CM) 
cost
## [1] 18.52399

We can see that the cost would be 18.2 eur/phone. This result is the best one obtained so far, however, we still need to try the gradient boosting before choosing the best option.

2.3.3 Gradient Boosting

GBM.train <- gbm(ifelse(PhonesTrain$PriceClass=="Cheap",0,1) ~., 
                 data=PhonesTrain, 
                 distribution= "bernoulli",
                 n.trees=250,
                 shrinkage = 0.01,
                 interaction.depth=2,
                 n.minobsinnode = 8) 

Prediction and cost

threshold = 0.3
gbmProb = predict(GBM.train, newdata=PhonesTest, n.trees=250, type="response") 
gbmPred = rep("Cheap", nrow(PhonesTest)) 
gbmPred[which(gbmProb > threshold)] = "Expensive" 
CM = confusionMatrix(factor(gbmPred), PhonesTest$PriceClass)$table 
cost = sum(as.vector(CM)*cost.unit)/sum(CM) 
cost
## [1] 18.37638

Not a very good result.

Let’s try now xgboost with Caret. Define first a grid for the hyperparameters:

xgb_grid = expand.grid(nrounds = c(500,1000),   
                      eta = c(0.01, 0.001), # c(0.01,0.05,0.1)   
                      max_depth = c(2, 4, 6),  
                      gamma = 1,  
                      colsample_bytree = c(0.2, 0.4),  
                      min_child_weight = c(1,5),  
                      subsample = 1 )

Then, train

xgb.train = train(PriceClass ~ ., 
                  data=PhonesTrain,    
                  trControl = ctrl,   
                  metric="EconomicCost",    
                  maximize = F,      
                  tuneGrid = xgb_grid,  
                  preProcess = c("center", "scale"), 
                  method = "xgbTree" )
## + Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:40:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:40:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:40:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:40:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:40:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:40:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:40:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:40:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:40:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:40:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:40:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:40:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:40:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:40:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:40:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:40:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:40:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:40:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:40:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:40:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:40:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:40:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:41:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:41:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:41:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:41:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:41:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:41:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:41:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:41:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:41:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:41:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:41:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:41:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:41:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:41:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:41:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:41:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:41:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:41:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:41:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:41:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:41:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:41:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:42:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:42:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:42:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:42:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:42:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:42:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:42:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:42:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:42:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:42:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:42:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:42:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:42:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:42:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:42:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:42:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:42:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:42:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:42:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:42:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:42:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:42:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:43:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:43:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:43:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:43:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:43:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:43:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:43:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:43:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:43:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:43:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:43:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:43:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:43:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:43:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:43:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:43:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:43:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:43:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:43:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:43:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:43:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:43:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:44:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:44:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:44:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:44:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:44:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:44:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:44:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:44:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:44:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:44:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:44:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:44:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:44:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:44:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:44:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:44:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:44:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:44:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:44:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:44:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:45:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:45:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:45:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:45:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:45:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:45:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:45:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:45:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## [09:45:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## [09:45:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## [09:45:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000 
## + Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## [09:45:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000 
## Aggregating results
## Selecting tuning parameters
## Fitting nrounds = 1000, max_depth = 6, eta = 0.01, gamma = 1, colsample_bytree = 0.4, min_child_weight = 1, subsample = 1 on full training set

Variable importance:

xgb_imp <- varImp(xgb.train, scale = F)
plot(xgb_imp, scales = list(y = list(cex = .95)))

Prediction and cost:

threshold = 0.3
xgbProb = predict(xgb.train, newdata=PhonesTest, type="prob") 
xgbPred = rep("Cheap", nrow(PhonesTest)) 
xgbPred[which(xgbProb[,2] > threshold)] = "Expensive" 
CM = confusionMatrix(factor(xgbPred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 20.4797

The cost obtained is 17.5 eur/phone, which is the best output obtained so far.

2.4 Conclusion of Classification

Now that we have finished the classification section, we can say that we managed to improve the savings of the fictional company by reducing the cost of each phone.

By selecting the best classifier, the Gradient Boosting one, we achieved a cost of 17.5 eur/phone. When we compare it to the naïve classifier that gave a cost of 33 eur/phone or the raw logistic regression classifier, with 27eur/phone cost, it may seem that there was no big improvement. However, if you take the 15 euros difference between classifiers, and multiply it by just a 1000 phones, we have 15000 euros of savings. Therefore, for a big company this little change can make the difference.

To sum up, being able to use these classification tools and select the best one in a real company can help them to earn much more money from each sale.

3. Advanced Regression

First we need to see which variables are the most correlated ones with the Price:

set.seed(123)
str(data) 
## 'data.frame':    1359 obs. of  19 variables:
##  $ Brand                 : Factor w/ 76 levels "10.or","Acer",..: 48 57 4 4 36 48 48 58 6 69 ...
##  $ Battery.capacity..mAh.: num  0.616 0.599 0.593 0.421 0.599 ...
##  $ Screen.size..inches.  : num  0.871 0.837 0.837 0.755 0.816 ...
##  $ Touchscreen           : Factor w/ 2 levels "1","0": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Resolution.x          : num  0.625 0.438 0.522 0.306 0.438 ...
##  $ Resolution.y          : num  0.795 0.591 0.673 0.418 0.574 ...
##  $ Processor             : num  0.778 0.778 0.556 0.556 0.778 ...
##  $ RAM..MB.              : num  1 0.497 0.33 0.33 0.497 ...
##  $ Internal.storage..GB. : num  0.5 0.125 0.125 0.125 0.25 ...
##  $ Rear.camera           : num  0.444 0.593 0.111 0.111 0.111 ...
##  $ Front.camera          : num  0.333 0.333 0.25 0.25 0.667 ...
##  $ Operating.system      : Factor w/ 7 levels "Android","BlackBerry",..: 1 1 4 4 1 1 1 1 1 1 ...
##  $ Wi.Fi                 : Factor w/ 2 levels "1","0": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Bluetooth             : Factor w/ 2 levels "1","0": 1 1 1 1 1 1 1 1 1 1 ...
##  $ GPS                   : Factor w/ 2 levels "1","0": 1 1 1 1 1 2 1 1 1 1 ...
##  $ Number.of.SIMs        : Factor w/ 3 levels "1","2","3": 2 2 2 2 1 2 2 2 1 2 ...
##  $ X3G                   : Factor w/ 2 levels "1","0": 1 1 1 1 2 1 1 1 1 2 ...
##  $ X4G..LTE              : Factor w/ 2 levels "1","0": 1 1 1 1 2 1 1 1 1 2 ...
##  $ Price                 : num  653 310 1183 696 553 ...
data_cor = data[,c(-1, -4, -12, -13, -14, -15, -16, -17, -18)] #Remove non-numerical variables 

corr_delay <- sort(cor(data_cor)["Price",], decreasing = T)
corr=data.frame(corr_delay) 

ggplot(corr,aes(x = row.names(corr), y = corr_delay)) +    geom_bar(stat = "identity", fill = "lightblue") +    scale_x_discrete(limits= row.names(corr)) +   labs(x = "", y = "Price", title = "Correlations") +    theme(plot.title = element_text(hjust = 0, size = rel(1.5)),         axis.text.x = element_text(angle = 45, hjust = 1))

We can see that the most correlated variable with Price is the Internal.storage..GB, followed by RAM..MB and the Resolution. However, all variables are over 0.25, showing some kind of relationship with the price.

Remembering relationships between variables…

gCor1

gCor2

cor_mat = cor(data_numerical)
heatmap(cor_mat)

We see that the screen’s resolution can be almost represented by one of the axis. This makes sense as smartphones tend to maintain a basic resolution (for example 1080p) but it is adjusted depending on the height of the device (y axis). We can see this properly in graph g12.

g12

This makes us reckon that there is a chance of getting a better model in eliminating the Resolution.x. Despite that, we will not delete that feature, by now.

Another detail to mention must be the relationship between the memories, internal and RAM, strongly related between them but not as strong as the resolutions.

The goal of this part is to create a regression model made of numerical variables such that we can predict the price of a mobile phone given certain characteristics. We will make other splits but now using Price with its numerical values.

# set.seed(123)
for_training = createDataPartition(log(data$Price), p = 0.75, list = FALSE)
# 75% for training
training = data[ for_training,]
testing = data[- for_training,]

From now on, we will use training and testing.

3.1 Simple and Multiple Regression Models

As a first approach to build the best possible model, the fastest idea is to use simple and multiple regression models.

For the simple regression models, we will use the most correlated variables as we have seen in the correlation matrix above. Before that, let’s get some insights on the relationships between the most correlated variables.

Let’s see first the variability of Price:

training %>% ggplot(aes(x=Price)) + geom_density(fill="navyblue")

training %>% ggplot(aes(x=Price / Internal.storage..GB.)) + geom_density(fill="navyblue")
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).

training %>% ggplot(aes(x= Price, y = Internal.storage..GB.)) + 
             geom_point(fill="navyblue")  # Most "constant" variability

# training %>% ggplot(aes(x= log(Price), y = Internal.storage..GB.)) + 
#              geom_point(fill="navyblue")
# training %>% ggplot(aes(x= Price, y = log(Internal.storage..GB.))) + 
#              geom_point(fill="navyblue")
# training %>% ggplot(aes(x= log(Price), y = log(Internal.storage..GB.))) + 
#              geom_point(fill="navyblue")

We see that Price itself has a lower variability and, in fact, using it per GB of internal storage it has an even lower variability. This is indicating that Internal.storage..GB. is a feature to not leave out. Also, we saw that logarithms did not help to reduce the variability. Therefore for our simple model we will just use Price along Internal.storage..GB.

3.1.1 Simple Regression Model

# Simple regression model with just Price and Internal Storage
simple1 = lm(Price ~ Internal.storage..GB., data = training)
summary(simple1)  # poor result R^2 = 0.4333
## 
## Call:
## lm(formula = Price ~ Internal.storage..GB., data = training)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -431.50  -37.65  -21.65    6.35  975.38 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             47.821      4.298   11.12   <2e-16 ***
## Internal.storage..GB. 1279.526     45.789   27.94   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 106.1 on 1019 degrees of freedom
## Multiple R-squared:  0.4338, Adjusted R-squared:  0.4333 
## F-statistic: 780.9 on 1 and 1019 DF,  p-value: < 2.2e-16
cor(predict(simple1, newdata = testing), testing$Price) ^ 2  # tested result
## [1] 0.3979553
par(mfrow=c(2,2))
plot(simple1, pch = 23 ,bg='mediumpurple3', cex = 2)

Despite an R-squared below 0.45 we see that we are on a “good path” as the residuals seem to have enough flexibility and the Normal Q-Q seems to be proper. However, the Scale-Location and the Residuals vs Leverage shows us that there is a big room for improvement, in addition to that 0.4333 of R-squared value. Nontheless, the predicted R^2 is below 0.40.

3.1.2 Multiple Regression Model

Now, we will use more variables in seek of the best model.

# multiple1 = lm(Price ~ Internal.storage..GB. + RAM..MB. + Resolution.y*
#                 Resolution.x + Screen.size..inches., 
#                data = training)
# summary(multiple1)

# multiple2 = lm(Price ~ Internal.storage..GB.* RAM..MB. + Resolution.y *
#                  Resolution.x + Screen.size..inches., 
#                data = training)
# summary(multiple2)

# multiple3 = lm(Price ~ Internal.storage..GB.* RAM..MB. + Resolution.y *
#                  Resolution.x + Screen.size..inches. + Rear.camera * Front.camera, 
#                data = training)
# summary(multiple3) # Adjusted R-squared:  0.593 

multiple4 = lm(Price ~ Internal.storage..GB.* RAM..MB. + 
                 Resolution.y * Resolution.x + 
                 Front.camera + Processor * Battery.capacity..mAh.,
               data = training)
summary(multiple4)
## 
## Call:
## lm(formula = Price ~ Internal.storage..GB. * RAM..MB. + Resolution.y * 
##     Resolution.x + Front.camera + Processor * Battery.capacity..mAh., 
##     data = training)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -363.87  -32.83  -11.54   12.17  906.38 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        46.919     15.998   2.933 0.003435 ** 
## Internal.storage..GB.            -214.338    140.355  -1.527 0.127046    
## RAM..MB.                          118.966     50.464   2.357 0.018590 *  
## Resolution.y                       -2.494     81.804  -0.030 0.975685    
## Resolution.x                      -11.768     65.235  -0.180 0.856876    
## Front.camera                     -115.325     34.493  -3.343 0.000858 ***
## Processor                         -31.936     34.766  -0.919 0.358523    
## Battery.capacity..mAh.             19.075     45.798   0.417 0.677131    
## Internal.storage..GB.:RAM..MB.   1587.624    174.215   9.113  < 2e-16 ***
## Resolution.y:Resolution.x         586.521    118.884   4.934 9.44e-07 ***
## Processor:Battery.capacity..mAh.  -33.341     77.216  -0.432 0.665993    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 89.96 on 1010 degrees of freedom
## Multiple R-squared:  0.5967, Adjusted R-squared:  0.5927 
## F-statistic: 149.5 on 10 and 1010 DF,  p-value: < 2.2e-16
cor(predict(multiple4, newdata = testing), testing$Price)^2   # 0.4415989
## [1] 0.4415989

After trying several combinations, the best multiple regression model was the one in which we paired numerical variables. The result was good in theory around R-squared value as 0.5927. However, when predicting with that model we got a bit more than 0.44. So this is where we started to realise that we needed to take into account somehow categorical variables in a numerical model.

We consider that the Brand and Operating system of a mobile phone can be a crucial factor to determine its price. Therefore, we will pass them from categorical to numerical and then normalise them. Then we will repeat the procedure for the Multiple regression and see what is the result (using re-created splits).

# Brand from categorical to normalised numerical
data$Brand = as.factor(data$Brand)
data$Brand = as.integer(data$Brand)
data$Brand = (data$Brand - min(data$Brand)) / (max(data$Brand) - min(data$Brand))

# Operating System from categorical to normalised numerical
data$Operating.system = as.factor(data$Operating.system)
data$Operating.system = as.integer(data$Operating.system)
data$Operating.system = (data$Operating.system - min(data$Operating.system)) / 
                        (max(data$Operating.system) - min(data$Operating.system))

# We will re-create the partitions
for_training = createDataPartition(log(data$Price), p = 0.75, list = FALSE)
# 75% for training
training = data[ for_training,]
testing = data[- for_training,]

After changing combinations and selecting variables, the best multiple regression model obtained was:

multipleBest = lm(Price ~ Internal.storage..GB.* RAM..MB. + Resolution.y *
                 Resolution.x + Front.camera +
                 Brand * Operating.system, 
               data = training) 
summary(multipleBest) # Adjusted R-squared:  0.586
## 
## Call:
## lm(formula = Price ~ Internal.storage..GB. * RAM..MB. + Resolution.y * 
##     Resolution.x + Front.camera + Brand * Operating.system, data = training)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -409.90  -34.49   -7.33   13.53  661.09 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       48.88      13.39   3.650 0.000275 ***
## Internal.storage..GB.           -169.15     145.43  -1.163 0.245055    
## RAM..MB.                         254.33      52.24   4.868 1.31e-06 ***
## Resolution.y                     -59.17      70.86  -0.835 0.403904    
## Resolution.x                     -95.34      58.50  -1.630 0.103459    
## Front.camera                     -93.05      32.32  -2.879 0.004068 ** 
## Brand                            -13.43      11.11  -1.209 0.226921    
## Operating.system                 472.07      50.12   9.419  < 2e-16 ***
## Internal.storage..GB.:RAM..MB.  1139.90     210.77   5.408 7.94e-08 ***
## Resolution.y:Resolution.x        626.66     100.38   6.243 6.31e-10 ***
## Brand:Operating.system          -641.95      88.24  -7.275 6.96e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 87.1 on 1010 degrees of freedom
## Multiple R-squared:   0.59,  Adjusted R-squared:  0.586 
## F-statistic: 145.4 on 10 and 1010 DF,  p-value: < 2.2e-16
cor(predict(multipleBest, newdata = testing), testing$Price)^2 # 0.6151373
## [1] 0.6151373

Despite having a lower theoretical R-squared = 0.586, in practice at the moment of predicting we went from 0.44 to 0.615. Now we can fairly say that this model is actually good for predicting the price in comparison to the single regression model.

3.1.3 Model Selection

In case we could improve our model, the best way to see it now is by using a more automatically way by inspecting the different combinations. There are several ways to automatise this part, either using the library leaps or olsrr. In this case, we will use the library olsrr, we could select the best model by looking at all possible, the best of subsets, stepping forward, backward and for AIC. The method that got us the best results was old_step_best_subset().

(For the sake of clarity and understanding the project, we will not include all the selecting methods, just the one that got us the best results).

library(olsrr)
## Warning: package 'olsrr' was built under R version 4.3.2
## 
## Attaching package: 'olsrr'
## The following object is masked from 'package:MASS':
## 
##     cement
## The following object is masked from 'package:datasets':
## 
##     rivers
model = Price ~ Internal.storage..GB.* RAM..MB. + Resolution.y *
                 Resolution.x + Front.camera +
                 Brand * Operating.system
fittness = lm(model, data = training)

ols_step_best_subset(fittness)
plot(ols_step_best_subset(fittness))

It tells us that the best model in all terms, complexity and R-squared is the 7 by just a little bit. Let’s see:

# RAM..MB. Resolution.x Front.camera Operating.system Internal.storage..GB.:RAM..MB. Resolution.y:Resolution.x Brand:Operating.system

multipleGoat = lm(Price ~ Internal.storage..GB. * RAM..MB. + Front.camera +
                  Resolution.y * Resolution.x + Brand * Operating.system, 
               data = training) 
summary(multipleGoat) # Adjusted R-squared:  0.586
## 
## Call:
## lm(formula = Price ~ Internal.storage..GB. * RAM..MB. + Front.camera + 
##     Resolution.y * Resolution.x + Brand * Operating.system, data = training)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -409.90  -34.49   -7.33   13.53  661.09 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       48.88      13.39   3.650 0.000275 ***
## Internal.storage..GB.           -169.15     145.43  -1.163 0.245055    
## RAM..MB.                         254.33      52.24   4.868 1.31e-06 ***
## Front.camera                     -93.05      32.32  -2.879 0.004068 ** 
## Resolution.y                     -59.17      70.86  -0.835 0.403904    
## Resolution.x                     -95.34      58.50  -1.630 0.103459    
## Brand                            -13.43      11.11  -1.209 0.226921    
## Operating.system                 472.07      50.12   9.419  < 2e-16 ***
## Internal.storage..GB.:RAM..MB.  1139.90     210.77   5.408 7.94e-08 ***
## Resolution.y:Resolution.x        626.66     100.38   6.243 6.31e-10 ***
## Brand:Operating.system          -641.95      88.24  -7.275 6.96e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 87.1 on 1010 degrees of freedom
## Multiple R-squared:   0.59,  Adjusted R-squared:  0.586 
## F-statistic: 145.4 on 10 and 1010 DF,  p-value: < 2.2e-16
cor(predict(multipleGoat, newdata = testing), testing$Price)^2 # 0.6151373
## [1] 0.6151373

We see, that in fact was the multiple model that we suggested. From the plot below we see an overall improvement.

par(mfrow=c(2,2))
plot(multipleGoat, pch = 23 ,bg='mediumpurple3', cex = 2)

3.2 Other regression models

Now we are going to continue with other statistical learning regression models. First we prepare the model we have selected

ctrl <- trainControl(method = "repeatedcv", 
                     number = 5, repeats = 1)

model = Price ~ Internal.storage..GB.* RAM..MB. + Resolution.y *
                 Resolution.x + Front.camera +
                 Brand * Operating.system

linFit <- lm(model,
             data=training)

#summary(linFit)

# to save all the predictors obtained:
test_results <- data.frame(price = testing$Price)

3.2.1 Overfitted Linear Regression

alm_tune <- train(model, data = training, 
                  method = "lm", 
                  preProc=c('scale', 'center'),
                  trControl = ctrl)

test_results$alm <- predict(alm_tune, testing)
postResample(pred = test_results$alm,  obs = test_results$price)
##        RMSE    Rsquared         MAE 
## 127.9314108   0.6151373  59.1487843

Not a bad prediction but using it could be risky due to the excessive fitness to the training.

qplot(test_results$alm, test_results$price) + 
  labs(title="Linear Regression Observed VS Predicted", x="Predicted", y="Observed") +
  geom_abline(intercept = 0, slope = 1, colour = "blue") +
  theme_bw()
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

3.2.2 Forward Regression

for_tune <- train(model, data = training, 
                  method = "leapForward", 
                  preProc=c('scale', 'center'),
                  tuneGrid = expand.grid(nvmax = 4:10),
                  trControl = ctrl)

for_tune
## Linear Regression with Forward Selection 
## 
## 1021 samples
##    7 predictor
## 
## Pre-processing: scaled (10), centered (10) 
## Resampling: Cross-Validated (5 fold, repeated 1 times) 
## Summary of sample sizes: 818, 816, 816, 816, 818 
## Resampling results across tuning parameters:
## 
##   nvmax  RMSE      Rsquared   MAE     
##    4     91.05288  0.5593826  51.00382
##    5     88.55225  0.5863609  49.71501
##    6     89.10925  0.5827839  50.06133
##    7     88.69971  0.5860553  50.15300
##    8     88.87251  0.5844075  50.36017
##    9     88.78076  0.5853693  50.25454
##   10     88.67707  0.5863621  50.19712
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was nvmax = 5.
plot(for_tune)

We can see that 5 or 6 Predictors obtained the lowest RMSE, so we will use 5 for our prediction.

coef(for_tune$finalModel, for_tune$bestTune$nvmax)
##                    (Intercept)                       RAM..MB. 
##                      124.17728                       16.45963 
##               Operating.system Internal.storage..GB.:RAM..MB. 
##                       60.55494                       47.06839 
##      Resolution.y:Resolution.x         Brand:Operating.system 
##                       49.84204                      -47.06419
# We use those variables for our prediction
test_results$frw <- predict(for_tune, testing)
postResample(pred = test_results$frw,  obs = test_results$price)
##        RMSE    Rsquared         MAE 
## 128.9845147   0.6030451  60.3293853
qplot(test_results$frw, test_results$price) + 
  labs(title="Forward Regression Observed VS Predicted", x="Predicted", y="Observed") +
  geom_abline(intercept = 0, slope = 1, colour = "blue") +
  theme_bw()

But we see worse prediction results.

3.2.3 Backward Regression

back_tune <- train(model, data = training, 
                   method = "leapBackward", 
                   preProc=c('scale', 'center'),
                   tuneGrid = expand.grid(nvmax = 4:10),
                   trControl = ctrl)
back_tune
## Linear Regression with Backwards Selection 
## 
## 1021 samples
##    7 predictor
## 
## Pre-processing: scaled (10), centered (10) 
## Resampling: Cross-Validated (5 fold, repeated 1 times) 
## Summary of sample sizes: 817, 817, 817, 817, 816 
## Resampling results across tuning parameters:
## 
##   nvmax  RMSE      Rsquared   MAE     
##    4     89.35453  0.5567866  49.48622
##    5     89.61526  0.5542371  49.83512
##    6     88.66141  0.5654053  49.73466
##    7     88.93745  0.5643857  50.05012
##    8     89.05671  0.5645377  50.06268
##    9     88.89159  0.5660702  50.12095
##   10     88.80189  0.5662878  49.97157
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was nvmax = 6.
plot(back_tune)

Now we have as optimal nvmax = 6, almost 7 as before. We see again, that models with a large amount of variables are helping us to predict better the price.

coef(back_tune$finalModel, back_tune$bestTune$nvmax)
##                    (Intercept)                       RAM..MB. 
##                      124.17728                       29.30339 
##                   Front.camera               Operating.system 
##                      -15.23130                       59.48330 
## Internal.storage..GB.:RAM..MB.      Resolution.y:Resolution.x 
##                       45.18856                       49.27088 
##         Brand:Operating.system 
##                      -46.40438
test_results$bw <- predict(back_tune, testing)
postResample(pred = test_results$bw,  obs = test_results$price)
##        RMSE    Rsquared         MAE 
## 127.0414554   0.6181511  59.3245515
qplot(test_results$bw, test_results$price) + 
  labs(title="Backward Regression Observed VS Predicted", x="Predicted", y="Observed") +
  geom_abline(intercept = 0, slope = 1, colour = "blue") +
  theme_bw()

We see an improvement from our last best, now the R-squared is 0.618. Now this our best model, so far.

3.2.4 Stepwise Regression

step_tune <- train(model, data = training, 
                   method = "leapSeq", 
                   preProc=c('scale', 'center'),
                   tuneGrid = expand.grid(nvmax = 4:10),
                   trControl = ctrl)
plot(step_tune)

# which variables are selected?
coef(step_tune$finalModel, step_tune$bestTune$nvmax)
##                    (Intercept)               Operating.system 
##                      124.17728                       60.09993 
## Internal.storage..GB.:RAM..MB.      Resolution.y:Resolution.x 
##                       57.39614                       55.95360 
##         Brand:Operating.system 
##                      -47.48937
test_results$seq <- predict(step_tune, testing)
postResample(pred = test_results$seq,  obs = test_results$price)
##        RMSE    Rsquared         MAE 
## 126.6029866   0.6121991  59.1994080
qplot(test_results$seq, test_results$price) + 
  labs(title="Backward Regression Observed VS Predicted", x="Predicted", y="Observed") +
  geom_abline(intercept = 0, slope = 1, colour = "blue") +
  theme_bw()

Worse results.

3.2.5 Ridge Regression

Using glmnet we get:

# X matrix
X = model.matrix(model, data=training)

# y variable
y = training$Price

grid = seq(0, .1, length = 100)  # a 100-size grid for lambda (rho in slides)
ridge.mod = glmnet(X, y, alpha=0, lambda=grid)  # alpha=0 for ridge regression

#dim(coef(ridge.mod))
#coef(ridge.mod)

plot(ridge.mod, xvar="lambda")

ridge.cv = cv.glmnet(X, y, type.measure="mse", alpha=0)
plot(ridge.cv)

opt.lambda <- ridge.cv$lambda.min
opt.lambda # 8.96
## [1] 8.963584
lambda.index <- which(ridge.cv$lambda == ridge.cv$lambda.1se)
beta.ridge <- ridge.cv$glmnet.fit$beta[, lambda.index]
#beta.ridge

And the prediction obtained is

X.test = model.matrix(model, data=testing)

ridge.pred = predict(ridge.cv$glmnet.fit, s=opt.lambda, newx=X.test)

y.test = testing$Price

postResample(pred = ridge.pred,  obs = y.test)
##        RMSE    Rsquared         MAE 
## 132.2043352   0.5934851  61.3673307

The R-squared obtained is almost 60% and the RMSE is 132. These values tell us that this prediction is not very good.

Therefore, we will try now using caret

ridge_grid <- expand.grid(lambda = seq(0, .1, length = 100))

ridge_tune <- train(model, data = training,
                    method='ridge',
                    preProc=c('scale','center'),
                    tuneGrid = ridge_grid,
                    trControl=ctrl)
plot(ridge_tune) 

With this curve we can see that the optimal lambda is around 0.5.

# the best tune
ridge_tune$bestTune
# prediction
test_results$ridge <- predict(ridge_tune, testing)

postResample(pred = test_results$ridge,  obs = test_results$price)
##        RMSE    Rsquared         MAE 
## 128.6794953   0.6089725  60.0268921

The results obtained are nearly the same than the glmnet ones.

3.2.6 The Lasso

lasso_grid <- expand.grid(fraction = seq(.01, 1, length = 100))

lasso_tune <- train(model, data = training,
                    method='lasso',
                    preProc=c('scale','center'),
                    tuneGrid = lasso_grid,
                    trControl=ctrl)
plot(lasso_tune)

lasso_tune$bestTune
test_results$lasso <- predict(lasso_tune, testing)
postResample(pred = test_results$lasso,  obs = test_results$price)
##       RMSE   Rsquared        MAE 
## 130.259396   0.603403  60.045303

Again, the R-squared is 60% and 129 RMSE.

3.2.7 Elastic Net

elastic_grid = expand.grid(alpha = seq(0, .2, 0.01), lambda = seq(0, .1, 0.01))

glmnet_tune <- train(model, data = training,
                     method='glmnet',
                     preProc=c('scale','center'),
                     tuneGrid = elastic_grid,
                     trControl=ctrl)

plot(glmnet_tune)

glmnet_tune$bestTune
test_results$glmnet <- predict(glmnet_tune, testing)

postResample(pred = test_results$glmnet,  obs = test_results$price)
##        RMSE    Rsquared         MAE 
## 128.3381415   0.6137093  59.4068604

No improvement in the prediction.

3.3 Machine Learning Tools

Now, we will try machine learning models to see if we can improve our last best model (Backward Regression with R-squared = 0.618).

3.3.1 kNN

knn_tune <- train(model, 
                  data = training,
                  method = "kknn",   
                  preProc=c('scale','center'),
                  tuneGrid = data.frame(kmax=c(11,13,15,19,21),
                                        distance=2 ,
                                        kernel='optimal'),
                  trControl = ctrl)
plot(knn_tune)

test_results$knn <- predict(knn_tune, testing)

postResample(pred = test_results$knn,  obs = test_results$price)
##        RMSE    Rsquared         MAE 
## 135.4068769   0.5860647  57.7535950

Worse that the statistical learning ones (R-squared = 0.58 and RMSE = 135).

3.3.2 Random Forest

rf_tune <- train(model, 
                 data = training,
                 method = "rf",
                 preProc=c('scale','center'),
                 trControl = ctrl,
                 ntree = 100,
                 tuneGrid = data.frame(mtry=c(1,3,5,7)),
                 importance = TRUE)

plot(rf_tune)

test_results$rf <- predict(rf_tune, testing)

postResample(pred = test_results$rf,  obs = test_results$price)
##        RMSE    Rsquared         MAE 
## 132.0761717   0.5894287  54.5428281

No improvement, the R-squared is 0.59 and the RMSE is 132.

3.3.3 Gradient Boosting

xgb_tune <- train(model, 
                  data = training,
                  method = "xgbTree",
                  preProc=c('scale','center'),
                  objective="reg:squarederror",
                  trControl = ctrl,
                  tuneGrid = expand.grid(nrounds = c(500,1000), 
                                         max_depth = c(5,6,7), 
                                         eta = c(0.01, 0.1, 1),
                                         gamma = c(1, 2, 3), 
                                         colsample_bytree = c(1, 2),
                                         min_child_weight = c(1), 
                                         subsample = c(0.2,0.5,0.8)))
## [09:47:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
test_results$xgb <- predict(xgb_tune, testing)

postResample(pred = test_results$xgb,  obs = test_results$price)
##        RMSE    Rsquared         MAE 
## 135.4803666   0.5909701  52.6069315

The R-squared obtained is 62% and the RMSE is 127. This is the best regression model but very time-consuming.

3.4 Ensemble and Final Prediction

apply(test_results[-1], 2, function(x) mean(abs(x - test_results$price)))
##      alm      frw       bw      seq    ridge    lasso   glmnet      knn 
## 59.14878 60.32939 59.32455 59.19941 60.02689 60.04530 59.40686 57.75360 
##       rf      xgb 
## 54.54283 52.60693
# Combination
test_results$comb = (test_results$xgb + test_results$bw)/2

postResample(pred = test_results$comb,  obs = test_results$price)
##        RMSE    Rsquared         MAE 
## 127.1561924   0.6565751  54.3606503

We obtained the best best model by combining the overfitted, the knn and the random forest models. In this way we obtained the best outcome yet, which is:

  • RMSE of 127

  • R-squared = 0.656

  • MAE = 54

Therefore, for the final prediction we are going to use the ensembled regression.

yhat = test_results$comb

head(yhat)
## [1] 526.5573 681.3712 174.0890 515.7914 359.2260 459.1025
hist(yhat, col="lightblue")

3.4.1 Prediction Intervals

y = test_results$price
error = y-yhat
hist(error, col="lightblue")

noise = error[1:100]

# 90% confidence
lwr = yhat[101:length(yhat)] + quantile(noise,0.05, na.rm=T)
upr = yhat[101:length(yhat)] + quantile(noise,0.95, na.rm=T)

predictions = data.frame(real=y[101:length(y)],
                         fit=yhat[101:length(yhat)],
                         lwr=lwr,
                         upr=upr)

predictions = predictions %>% mutate(out=factor(if_else(real<lwr | real>upr,1,0)))

# how many real observations are out of the intervals?
mean(predictions$out==1)
## [1] 0.05462185
ggplot(predictions, aes(x=fit, y=real))+
  geom_point(aes(color=out)) + theme(legend.position="none") +
  geom_ribbon(data=predictions,aes(ymin=lwr,ymax=upr),alpha=0.3) +
  labs(title = "Prediction intervals", x = "prediction",y="real price")

We can see that only a 5% of the predictions are far away from the original value. This means that the regression model works fine.

3.5. Conclusion

We saw a good behaviour of models with 6 or 7 variables, which makes sense, as there were around 2 or 3 pairs of variables that were fundamental in describing the price.

In the end, we saw that the best was to use a combination of:

  • A sequential model, Gradient boosting, as it improves model after model. This machine learning method is prone to be affected by overfitting. But thanks to the featuring engineering, not only it did not give us problems but it improved the overall model.

  • Backwards regression, this method depended more on our understanding of the variables and the concepts behind smartphone’s pricing. We saw a few pair of variables that helped us create a model almost as good as machine learning model.

To sum up, combining both statistical and machine learning procedures, along with a proper handling of the data, made us build a model that almost 70% of the time is going to predict right the price of the smartphone.